[Script] Network routing/failover topology change detection

A while ago I wrote a simple but useful script which I’m sharing here to detect upstream provider HSRP failover events via traceroute. It can be used for all kinds of virtual IP routing failover like VRRP, Check Point Cluster XL, actual routing protocols like BGP/OSPF or similar technologies where IP packets can be routed across multiple hops.
The script executes traceroutes to a given destination and checks whether the path is being routed over a certain hop, with the ability to send mail notifications if this is not the case.

You can get the most recent version of this script on my Github here. If you have any suggestions or improvements (which I’m sure there is plenty of room for), feel free to drop a comment or an issue or a pull-request on Github.

Continue reading

Analyzing and coping with a SSDP amplification DDoS attack

A while ago we were hit by an amplification/reflection DDoS attack against our public-facing network. I was familiar with NTP and DNS based reflection DDoS attacks, but this one employed the Simple Service Discovery Protocol (SSDP) to flood our tubes, a  name name I’ve heard before and saw in packet traces randomly, but hardly knew anything about to be honest.
SSDP is a UDP-based protocol for service discovery and UPnP functionality with an HTTP-like syntax. It’s deployed by modern operating systems and embedded systems like home routers, where it is sometimes enabled even on their external interfaces, which makes this kind of attack possible.

The Shadowserver Foundation has a nice website with lots of information and statistics of public SSDP-enabled devices: While the number of open or vulnerable DNS and NTP is going down steadily, there are currently around 14 million IPs around the world that respond to SSDP requests, and the number is only declining very slowly:

Due to this we can expect that SSDP will be abused for DDoS attacks more often in the future.

Continue reading

[Script] Poor man’s vSphere network health check for standard vSwitches

Among many other new features, the vSphere 5.1 distributed vSwitch brought us the Network Health Check feature. The main purpose of this feature is to ensure that all ESXi hosts attached to a particular distributed vSwitch can access all VLANs of the configured port groups and with the same MTU. This is really useful in situations where you’re dealing with many VLANs and the roles of virtualization and network admin are strongly separated.

Unfortunately, like pretty much all newer networking features, the health check is only included in the distributed vSwitch which requires vSphere Enterprise+ licenses.

There are a couple other (cumbersome) options you have though:
If you’re have ESXi 5.5 you can use the pktcap-uw utility on the ESXi shell to check if your host receives frames for a specific VLAN on an uplink port:
The following example command will capture receive-side frames tagged with VLAN 100 on uplink vmnic3:
# pktcap-uw –uplink vmnic3 –dir 0 –capture UplinkRcv –vlan 100
If systems are active in this VLAN you should see a few broadcasts or multicasts already and meaning the host is able to receive frames on this NIC for this VLAN. Repeat that for every physical vmnic uplink and VLAN.

Another way to check connectivity is to create a vmkernel interface for testing on this VLAN and using vmkping. Since manually configuring a vmkernel interface for every VLAN seems like a huge PITA, I came up with a short ESXi shell script to automate that task.
Check out the script below. It uses a CSV-style list to configure vmkernel interfaces with certain VLAN and IP settings, pinging a specified IP on that network with the given payload size to account for MTU configuration. This should at least take care of initial network-side configuration errors when building a new infrastructure or adding new hosts.
This script was tested successfully on ESXi 5.5 and should work on ESXi 5.1 as well. I’m not entirely sure about 5.0 but that should be ok too. (Please leave a comment in case you can confirm/refute that).

Introducing the ghetto-vSwitchHealthCheck.sh:

Update: I have moved my scripts to GitHub and updated some of them a bit. You can find the current version of this particular script here.

Continue reading

HP Virtual Connect Firmware 4.01 released

After announcing it a while ago already, HP released Virtual Connect firmware 4.01 last week to the public:
http://www.hp.com/swpublishing/MTX-312edceee05e4f13a316c9c504
The release notes can be found here.
You may also want to check out the User and Installation documentation:
HP Virtual Connect for c-Class BladeSystem Setup and Installation Guide Version 4.01 and later
HP Virtual Connect for c-Class BladeSystem Version 4.01 User Guide
HP Virtual Connect Manager Command Line Interface for c-Class BladeSystem Version 4.01 User Guide

Note that according to the most recent HP VMware Recipe document from April, the recommended VC firmware version for vSphere/ESX(i) environments is still 3.75. Continue reading

nscd DNS caching and postfix

A few of our mail gateway servers running with postfix/policyd-weight/amavis/spamassaisin generate a lot of DNS queries to our DNS servers at times.
I’m not particularly concerned about that myself but there were some discussions about whether we should or how we could decrease the volume of DNS queries.

One suggestion for an easy and convenient solution was to just install the Name Service Cache Daemon (nscd) to cache responses locally on the mail server. They implemented this quickly on one of the servers but it didn’t really seem to work, it still generated loads of queries and the nscd statistics didn’t indicate that caching was working. Also, the statistics output of nscd -g always displayed 0% cache hit ratio and 0 cache hits on positive entries.
So they just as quickly abandoned the idea without digging into it deeper and more or less forgot about the whole plan in general, as it wasn’t like we had any real issues in the first place.
Other options discussed were setting up dedicated caching-only resolvers (onto the hosts themselves) which wouldn’t have been difficult either.

Fast forward a few months and the so called “issue” of too many DNS queries came up again recently and I decided to check nscd myself and why it supposedly wouldn’t work.

Continue reading

Check Point R75.45 released

Last week, Check Point released the first major update to the Gaia-introducing R75.40 version. The release of R75.45  mainly brings enhancements and fixes for the new Gaia platform:
R75.45 Release Notes
R75.45 Resolved Issues
R75.45 Known Limitations

R75.45 includes this R75.40 hotfix.
Among the new features provided, support for 6in4 tunneling, policy-based routing appear to be the most intriguing (at least to me).
The long list of fixes addresses important issues such as long policy install time, kernel memory leaks or IPS not exiting bypass mode again after returning to normal CPU utilization.
Only direct upgrades from R75.40 are supported, no earlier versions.

I haven’t tried this new release yet, but given the list of fixes and potential first prematureness of the still young Gaia platform, I’ll check it out rather sooner than later.

Running Check Point ClusterXL member interfaces on different subnets than the virtual interfaces

We have a couple of 802.11q VLAN interfaces on our Check Point Firewalls for a number of small public DMZ networks, providing full layer 2 isolation for usually only one or two related servers in those networks. Using a /29 subnet for each VLAN means minus broadcast and network address, we can assign 6 (public) IP addresses in those networks. We also need an IP for the virtual ClusterXL interface to be used as a gateway by the servers. Up until recently, we also used 2 more IPs of each subnet for the members VLAN interfaces, so they can communicate via Check Points Cluster Control Protocol or CCP. This left us with merely 3 public IPv4 addresses per /29 subnet for actual servers inside those VLANs.
Wasting IPs like this on interfaces transparent for your actual services is not really practical in times of scarce IPv4 address space though. So what we did was getting rid of using IP addresses of the public subnet for the VLAN interfaces of each cluster member during our upgrade to R75.40 and Gaia on new systems.  This configuration is fully supported (actually since long ago) as per ClusterXL Admin Guide:

Configuring Cluster Addresses on Different Subnets
Only one routable IP address is required in a ClusterXL cluster, for the virtual cluster interface that faces the Internet. All cluster member physical IP addresses can be non-routable.Configuring different subnets for the cluster IP addresses and the member addresses is useful in order to:
– Enable a multi-machine cluster to replace a single-machine gateway in a pre-configured network, without the need to allocate new addresses to the cluster members.
– Allow organizations to use only one routable address for the ClusterXL Gateway Cluster. This saves routable addresses.

Continue reading

Linux not responding to ARP-Request from Juniper Firewall with subnet IP

We recently got a new firewalled subnet and VLAN for servers of a certain department and ran into a strange issue when we deployed our first system with Linux in there.
The system would seem to become randomly unreachable for an arbitrary amount of time, ranging from a few minutes to hours. Plugging a 2nd VM into the VLAN (on a different host and a different switch) confirmed that connectivity inside the VLAN was working as usual without issues, so the cause would have to be with the gateway or routing to the subnet.
Note that this being about “VMs” is completely irrelevant to this issue and would have happened the with physical servers too.

When the issue was brought to my attention, a quick trace of the network traffic and tests with the 2nd VM quickly revealed what was going wrong:
The Juniper firewall, acting as gateway in that subnet, sent ARP-Requests asking to “tell” the answer to the network address of the subnet and the Linux OS wouldn’t respond.

We have a plain, simple subnet of 10.9.24.0/24 with the Linux box in this case having the IP 10.9.24.56/24. The gateway is actually supposed to run with the 10.9.24.1/24, which it does too. But when sending ARP-Requests for forwarding incoming traffic, the gateway is using the subnet IP in the ARP-Request (“who-has 10.9.24.56 tell 10.9.24.0“). What?
Linux, unlike Windows does not seem to like this and just ignores the ARP-Request without even thinking of replying. Consequently, the gateway will never be able to resolve the MAC and thus never forward traffic to that IP.

A quick confirmation of this behavior from the viewpoint of a Linux box in this subnet:

# ifconfig eth0 10.9.24.56/24
# tcpdump -i eth0 -nnev arp or rarp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  12:39:32.403516 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
  12:39:34.635769 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
  12:39:36.635680 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46

>The Juniper keeps sending ARPs with the subnet IP as Sender protocol address, but the Linux OS ignores them and doesn’t bother to respond.

Now we do the following:

# ifconfig eth0 10.9.24.56/16
 # tcpdump -i eth0 -nnev arp or rarp
 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
 12:40:22.833832 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
 12:40:22.833871 00:50:56:84:00:a2 > f8:c0:01:12:5d:48, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.9.24.56 is-at 00:50:56:84:00:a2, length 28

>Now running with a /16 subnet mask for example, the IP 10.9.24.0 is just like any other host-IP to us and the Linux OS responds to the ARPs properly.

Workaround

Not being in charge of the networking side, I raised the issue with our network guys and in the meantime worked around it by manually broadcasting an ARP-Reply for our IP with the arping utility periodically. This is a great resource on ARP basics with arping counterparts for your experimenting pleasure.

The following command will broadcast a single ARP-Reply for our IP 10.9.24.56 through interface eth0:

# arping -c 1 -I eth0 -A 10.9.24.56
 # tcpdump -i eth0 -nnev arp or rarp
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  12:43:42.897764 00:50:56:84:00:a2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.9.24.56 is-at 00:50:56:84:00:a2, length 28

So here we got our more or less dirty but necessary workaround in the form of a Gratious ARP. I cronjob’d this and the issue was gone.
Afterwards, the networking guys set a static ARP entry on the Juniper , but I (and them probably) still have no idea why it sent ARPs with the subnet IP in the first place. We have other Juniper firewalled subnets where they use a physical host IP properly.

Again, Windows (tested 2008 R2) seems to be oblivious to those requests and answers anyways. I confirmed this with RHEL6 and Fedora 17, but I would be surprised if any other Linux with a halfway recent kernel handles this any different. Also this issue is NOT specific to a VMware ESXi VM or virtualization in general.
Skimming through the ARP RFC briefly, I haven’t found anything definite saying a system should not reply to the subnet IP or anything though.

ESX host notify switches behavior

This section is completely unrelated to the issue at hand here, but on the topic of ARPs, I was also curious as to how exactly the frames look which an ESX(i) host sends with the “notify switches” option when you vMotion a VM. I was always under the impression it was a Gratious ARP like I do with arping above, but it turns out it’s just Reverse-ARP broadcast frames. Not like it matters much, since it does the job of updating switching tables (which are NOT related to ARP tables that stay unaffected anyways).
Just found this too: This KB Article on NLB Unicast Issues actually explains pretty well that it is RARP.

# tcpdump -i eth0 -nnev arp or rarp
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  13:22:57.447460 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:57.842676 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:58.842661 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:59.842650 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46