Linux not responding to ARP-Request from Juniper Firewall with subnet IP

We recently got a new firewalled subnet and VLAN for servers of a certain department and ran into a strange issue when we deployed our first system with Linux in there.
The system would seem to become randomly unreachable for an arbitrary amount of time, ranging from a few minutes to hours. Plugging a 2nd VM into the VLAN (on a different host and a different switch) confirmed that connectivity inside the VLAN was working as usual without issues, so the cause would have to be with the gateway or routing to the subnet.
Note that this being about “VMs” is completely irrelevant to this issue and would have happened the with physical servers too.

When the issue was brought to my attention, a quick trace of the network traffic and tests with the 2nd VM quickly revealed what was going wrong:
The Juniper firewall, acting as gateway in that subnet, sent ARP-Requests asking to “tell” the answer to the network address of the subnet and the Linux OS wouldn’t respond.

We have a plain, simple subnet of 10.9.24.0/24 with the Linux box in this case having the IP 10.9.24.56/24. The gateway is actually supposed to run with the 10.9.24.1/24, which it does too. But when sending ARP-Requests for forwarding incoming traffic, the gateway is using the subnet IP in the ARP-Request (“who-has 10.9.24.56 tell 10.9.24.0“). What?
Linux, unlike Windows does not seem to like this and just ignores the ARP-Request without even thinking of replying. Consequently, the gateway will never be able to resolve the MAC and thus never forward traffic to that IP.

A quick confirmation of this behavior from the viewpoint of a Linux box in this subnet:

# ifconfig eth0 10.9.24.56/24
# tcpdump -i eth0 -nnev arp or rarp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  12:39:32.403516 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
  12:39:34.635769 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
  12:39:36.635680 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46

>The Juniper keeps sending ARPs with the subnet IP as Sender protocol address, but the Linux OS ignores them and doesn’t bother to respond.

Now we do the following:

# ifconfig eth0 10.9.24.56/16
 # tcpdump -i eth0 -nnev arp or rarp
 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
 12:40:22.833832 f8:c0:01:12:5d:48 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.9.24.56 tell 10.9.24.0, length 46
 12:40:22.833871 00:50:56:84:00:a2 > f8:c0:01:12:5d:48, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.9.24.56 is-at 00:50:56:84:00:a2, length 28

>Now running with a /16 subnet mask for example, the IP 10.9.24.0 is just like any other host-IP to us and the Linux OS responds to the ARPs properly.

Workaround

Not being in charge of the networking side, I raised the issue with our network guys and in the meantime worked around it by manually broadcasting an ARP-Reply for our IP with the arping utility periodically. This is a great resource on ARP basics with arping counterparts for your experimenting pleasure.

The following command will broadcast a single ARP-Reply for our IP 10.9.24.56 through interface eth0:

# arping -c 1 -I eth0 -A 10.9.24.56
 # tcpdump -i eth0 -nnev arp or rarp
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  12:43:42.897764 00:50:56:84:00:a2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.9.24.56 is-at 00:50:56:84:00:a2, length 28

So here we got our more or less dirty but necessary workaround in the form of a Gratious ARP. I cronjob’d this and the issue was gone.
Afterwards, the networking guys set a static ARP entry on the Juniper , but I (and them probably) still have no idea why it sent ARPs with the subnet IP in the first place. We have other Juniper firewalled subnets where they use a physical host IP properly.

Again, Windows (tested 2008 R2) seems to be oblivious to those requests and answers anyways. I confirmed this with RHEL6 and Fedora 17, but I would be surprised if any other Linux with a halfway recent kernel handles this any different. Also this issue is NOT specific to a VMware ESXi VM or virtualization in general.
Skimming through the ARP RFC briefly, I haven’t found anything definite saying a system should not reply to the subnet IP or anything though.

ESX host notify switches behavior

This section is completely unrelated to the issue at hand here, but on the topic of ARPs, I was also curious as to how exactly the frames look which an ESX(i) host sends with the “notify switches” option when you vMotion a VM. I was always under the impression it was a Gratious ARP like I do with arping above, but it turns out it’s just Reverse-ARP broadcast frames. Not like it matters much, since it does the job of updating switching tables (which are NOT related to ARP tables that stay unaffected anyways).
Just found this too: This KB Article on NLB Unicast Issues actually explains pretty well that it is RARP.

# tcpdump -i eth0 -nnev arp or rarp
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
  13:22:57.447460 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:57.842676 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:58.842661 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
  13:22:59.842650 00:50:56:84:03:78 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Ethernet (len 6), IPv4 (len 4), Reverse Request who-is 00:50:56:84:03:78 tell 00:50:56:84:03:78, length 46
Advertisements

7 thoughts on “Linux not responding to ARP-Request from Juniper Firewall with subnet IP

  1. Hello, I think I’m seeing the same issue with Juniper arp protocol and a Linux server. I assume you don’t have any further news on this?

    • No, I don’t unfortunately. This went down the networking guys “so we made a static ARP entry on the Juniper” road with no clear solution to it.
      But it’s cool to see we’re not alone with this weird issue. What adds to the confusion is that we have other (older) Juniper-firewalled subnets that don’t exhibit this behavior. I’m not sure what exactly the differences are as I only get very sparse information here.
      Have you raised an issue with Juniper or something on this?

      • Like you, I’m not from the network team, so don’t have a way to contact Juniper. And I’m still being treated with pretty much denial regarding my report. The static arp on router or more likely arping on server are the likely ways to proceed here as well, but I’ll try to work with the network team for a while longer.

  2. Some news; looks like there was a change of wind in the network team regarding this.

    Please see “http://serverfault.com/questions/423343/arp-requests-with-odd-source-ip-go-unanswered” for details.

    In short, there was a configuration issue with the Juniper firewall cluster (configuration not getting updated to all cluster parties, I think), and also something odd with the management application of the said FW cluster.

    • Thanks for the follow-up! What you describe there matches my case exactly, with the different MACs and having a firewall cluster in place too. Now I just need to get them to check their configuration again.

  3. Pingback: Connected to router but cannot access internet/ssh/ping e.t.c

  4. I ran into the exact same issue. Turned out that someone had set the manage-ip value to the subnet address:

    ____SNIP____
    Cluster:name(M)-> get config | inc aggregate4.175
    set interface aggregate10.200 ip x.x.x.x.225/28
    set interface aggregate10.200 manage-ip x.x.x.224
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ____SNIP____

    To fix:

    ____SNIP____
    unset interface aggregate10.200 manage-ip
    ____SNIP____

    This was a misconfiguration in our case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s