Bug 178700 - Gratutious arp sent by gateway ignored with bonding driver configured in active/backup
Gratutious arp sent by gateway ignored with bonding driver configured in acti...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Neil Horman
Brian Brock
:
Depends On:
Blocks: 170416
  Show dependency treegraph
 
Reported: 2006-01-23 11:43 EST by Jon Stanley
Modified: 2012-05-29 22:57 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-06-15 09:36:11 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Basic network diagram (2.10 KB, image/png)
2006-01-23 15:03 EST, Jon Stanley
no flags Details
rpm of kernel with code enabled to accept gratuitous arps (9.94 MB, application/octet-stream)
2006-01-30 16:54 EST, Neil Horman
no flags Details
patch 1/2 to enable reception of unsolicited arps (813 bytes, patch)
2006-01-31 08:06 EST, Neil Horman
no flags Details | Diff
patch to allow the acceptance of gratuitous arps (2.19 KB, patch)
2006-06-14 08:03 EDT, Neil Horman
no flags Details | Diff

  None (edit)
Description Jon Stanley 2006-01-23 11:43:25 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Description of problem:
When a CheckPoint firewall fails over, it will send a gratuitous ARP with the new hardware address and it's VIP.

On a system that is configured with bonding (mode 1, using arp_ip_target/arp_interval), the gratutious ARP from the firewall seems to be ignored, and the ARP cache must time out before sending any traffic to the new firewall.  This takes around 20-30 seconds sometimes (depending on the age of the ARP entry).

This problem is not reproducible on a system without bonding configured.

Version-Release number of selected component (if applicable):
kernel-2.6.9-22.0.1

How reproducible:
Always

Steps to Reproduce:
1.  Configure system with bonding (mode 1, using arp monitoring)
2.  Install Checkpoint firewall as gateway (or anything that handles failover by changing MAC address to the physical address of the new NIC and sending gratuitous ARP)
3.  Failover gateway.

Actual Results:  Delay of 20-30 seconds

Expected Results:  No delay, since the ARP cache should be updated with the new MAC address.

Additional info:
Comment 1 Jon Stanley 2006-01-23 11:52:00 EST
Also note that the kernel running on these systems is not exactly 2.6.9-
22.0.1.  It has patches for bug 167630 and bug 165018 in there (neither of 
which affect the ARP monitoring function of the bonding driver, AFAIK).  
Otherwise it is identical to 2.6.9-22.0.1
Comment 2 John W. Linville 2006-01-23 13:26:02 EST
Could you provide a diagram of the network topology (where switches are, how 
they are connected, etc)? 
 
Have you tried using miimon instead of arp_ip_target? 
Comment 3 Jon Stanley 2006-01-23 15:03:51 EST
Created attachment 123588 [details]
Basic network diagram

OK, here's a basic network diagram.  Just in case it's not clear:

The two hosts are attached to the same switch, however they're on different
VLAN's.  There is a cross-link between the two switches.  The firewalls are
also attached to the same pair of switches.  One firewall is attached entirely
to one switch, and the other firewall is attached entirely to the other switch.
 They run some HA protocol by which the active firewall publishes it's
(physical) MAC as the gateway.	So on failover, the MAC of the gateway changes,
and the newly active firewall sends a gratuitous ARP in order to inform all of
the attached machines of the newly active firewall.

Let me know if that's not clear.
Comment 4 John W. Linville 2006-01-23 15:23:43 EST
Some clarifying questions: 
 
What is the IP address shared by the firewalls?  What are the netmasks for all 
the IP addresses? 
 
Do the two lines from each host represent each host having a bond in 
active-backup mode, distributed across the switches?  (Just making sure...) 
 
Could you explain more about your VLAN configuration?  Do the hosts route to 
get to the firewall(s)?  Or does each firewall box have interfaces on both 
VLANs? 
 
Please do still try the miimon alternative as well...thanks! 
Comment 5 Jon Stanley 2006-01-23 16:08:53 EST
Sure.  Those are bogus IP addresses.  I can't give the real ones in a public
forum since they would identify my customer (I can e-mail you very detailed info
privately for you to put in here as a private note, though).  The firewalls do
have interfaces on each segment (and they're physical interfaces, not VLAN
tagged or anything).

Yes, the two lines represent active/backup bonding on hosts on both sides.

I'll try the mii_mon option, but I'm still trying to get a lab to do this in, so
I don't have to do it in prodcution (I have the systems, but the firewalls are
another thing).  If you know of something that would simulate the same thing
(suddenly changing MAC with gratuitous ARP), that would be wonderful!
Comment 6 Jon Stanley 2006-01-23 17:11:15 EST
I actually got to try this in production.  Changing from arp monitoring to mii
monitoring does indeed make the problem go away.  Problem is, ARP monitoring
provides a much more "functional" view of the interface than simply "is there
something on the other side".  Still a bug, but at least we've narrowed down
where it's at.
Comment 7 Jon Stanley 2006-01-23 18:57:06 EST
I've privately e-mailed a more detailed topology and the requested network info.
 I've also posted them in the SR that I have open on this issue and have put a
reference in that SR to this BZ.
Comment 8 Jon Stanley 2006-01-24 00:41:21 EST
There was also some additional information in that email that is not
confidential, so I figured that I'd post it here for the benefit of the community.

One thing that was there that may or may not be of significance is that the
firewall is used as the gateway for the segment that is sourcing the traffic -
i.e. I'm ssh'ing from system 1 to system 2.  However, it gets to system 2 via
way of routing to another router, and that router is system 2's gateway.  
System 2 uses the router as it's gateway, not the firewall.

There's also a third network interface on these systems (non-bonded) that's used
for administration/management.  When logged in to the system on the segment with
the CheckPoint for it's gateway via that interface, I did a 'watch --interval=1
arp -an' with both the MII monitoring as well as ARP.  When I did the MII
monitoring, I observed the ARP entry changed immediately, as expected.  When
done with the ARP monitoring, it was verified that the entry in the ARP cache
did not change.

We've also observed instances (it happens alot, since we have arp_interval set
to 1000) where the bonding driver sends it's directed ARP towards the gateway,
the firewall responds appropiately, and the ARP cache *still* does not get updated.
Comment 10 John W. Linville 2006-01-24 10:52:07 EST
It certainly seems as if the bond is eating the incoming ARP reply.  I'll 
continue to look into why that might be happening. 
 
Can you educate me on how the ARP monitoring is more useful to you than 
miimon?  Most seem to regard miimon as preferrable, with the ARP option 
serving only either for NICs that don't properly report their link status or 
for other odd circumstances (usually topology-related) that prevent miimon 
from properly reflecting the usability of the links in the bond.  (This is for 
my own curiousity, and doesn't strictly relate to this issue.) 
Comment 11 Jon Stanley 2006-01-24 11:06:20 EST
Mainly the reasoning was due to how I *thought* the ARP monitor worked, and
later found that it doesn't work like this (and have an RFE ticket in to try and
make it work like that).

What I thought happened was the bonding driver sent out an arp and expected a
*valid* reply for it.  This confirmed connectivity to my gateway.  How it
*actually* works is that it sends an arp, and expects that the tx/rx counter to
increment.  Still protects against some remote switch failures, so we decided to
go with it.

I have a request in to attempt another test.  I would switch the arp_ip_target
to something *other* than the gateway on the segment.  The theory here is that
the kernel is trying to protect itself from an ARP flood (I did notice the
sysctl net/ipv4/neigh/<int>/locktime, which is set to 99, measured in jiffies
from what I can tell - so it *looks* good, but we'll see)
Comment 12 Neil Horman 2006-01-25 10:56:15 EST
Hey there, I helping John out with this, and I was looking at the dumps that you
sent in.  Looking at them I see no arp requests or responses in any of the
traces, using miimon or not.  Can you comment on why that might be?  I would
have expected to at least see gratuituous arps from the firewall in the miimon
test, where you said the firewall failover worked.
Comment 13 Jon Stanley 2006-01-25 12:09:21 EST
I thought that might be the case :-(

I realized after I did it that I did 'tcpdump -i bond0 -w <file> host <host>'. 
I do have other dumps that *do* have the ARP's.  I'll send those to you via email.
Comment 14 Neil Horman 2006-01-27 16:20:23 EST
I think we found the problem, the kernel is currently configured not to accept
unsolicited ARP messages at all (CONFIG_IP_ACCEPT_UNSOLICITED_ARP is not set). 
I've got the reproducer set up here, and I'll test a kernel with it enabled and
get you a copy early next week.
Comment 15 Jon Stanley 2006-01-27 16:33:26 EST
Interesting that the machine without bonding and when using bonding with miimon
seems to work.  It it just a side affect of the ARP flood that ARP monitoring
generates, or am I just *extermely* lucky? :-)  Maybe I'll go buy a lottery
ticket or something :-)
Comment 16 Neil Horman 2006-01-30 09:46:28 EST
I can only imagine it must be one or the other :), since via code inspection I
see no way for a gratuitous arp to update the arp table unless a previous arp
request has populated the table with an incomplete entry.  I'm building a kernel
with this feature enabled now, and will provide it shortly.
Comment 17 Neil Horman 2006-01-30 16:54:37 EST
Created attachment 123889 [details]
rpm of kernel with code enabled to accept gratuitous arps

heres the kernel I promised.  Could you please give it a try and let me know
the results?  Thanks!
Comment 18 Jon Stanley 2006-01-30 16:59:15 EST
Sure.  What kind of kernel is this, and for what architecture?

Also, could you post the patch here as well?  I'd like to build it into a
mainline 2.6.9-22.0.2 kernel.

Also, out of curiosity, what is your test setup?  I don't have the firewalls
immediately at my disposal in the lab at the moment, so I'd like to try
something else...
Comment 19 Neil Horman 2006-01-31 08:06:50 EST
Created attachment 123908 [details]
patch 1/2 to enable reception of unsolicited arps

Sure, heres the patch.	The RPM is an smp-i686 kernel, as the name of the file
indicates.

Note I've marked the patch as 1/2.  I'm doing that as I think there is some
more work to be done to make this fully functional.  Its in fact somewhat
adventageous that you not test with just the firewall at the moment.  My setup
here consists of two machines, one with a bonded set of physical interfaces
connecting to a second machine on a private network.  On the second machine,
I've hacked up a copy of arping to send out customized gratuitous arps so that
I can toggle the MAC address that I'm sending out in the frame, and watch it
update on the bonded host.  This is where _not_ using the firewall boxes to
test with is adventageous.  John and I have been researching this over the past
few days, and have come to find that there is some general disagreement over
how a gratuitous arp should be constructed.  One camp asserts that the OP field
of the ARP message should be set to ARP_REQUEST, other think ARP_REPLY.  Linux
and your firewall box differ in this regard, and so even with this patch, your
firewall is not likely to work.  however, if you you test with the setup I
describe above, you should be able to see your bonded host update its arp table
in response to arp messages sent with a modified arping tool, that explicity
sets the OP field to ARP_REPLY.  I'm going to start working on a patch to
control which messages are counted as gratuitous.
Comment 20 Neil Horman 2006-01-31 14:50:12 EST
One more thing.  I've been reading this:
http://linux-ip.net/html/ether-arp.html
 and the section that differntiates between a gratuitous arp and an unsolicited
request seems to make a good deal of sense to me.  It appears to be the
mechanism that Linux follows, and before I go modifying it, I was wondering if
you could find any documentation justifying the arp requests your firewall sends
out.  By all rights the firewall is sending out gratuitous arp replies, except
for the fact that in the capture, the arp operation is set to REQUEST rather
than REPLY.  I'm starting to think that that is actually a bug in the firewall
code, rather than in how we handle those frames (minus the need to actually
enable the code which proceses those frames, which is still our bug).  Can you
look into that with your firewall provider, please?
Comment 21 Jon Stanley 2006-02-05 23:26:24 EST
Sorry for not getting back to you sooner.  So here's something weird.  On
Friday, I finally got the lab environment ready.  I have a simpler network than
in production, but I wouldn't think that would matter with the characterization
of this bug.  But it can't be reproduced (with the "stock" kernel).  I can fail
over the firewalls to my heart's content in various ways, and the MAC get's
updated on my system with bonding.  There's no substantial difference in the
systems in production and this one.

My suggestion (not being a kernel programmer) is to add some printk() calls at
various locations in the ARP handling code, so that we can see exactly what it's
doing and what gets called where.  This is absolutely baffling to me that two
systems, with as near to identical setups as I can get, don't experience the
same issues.

Also, I can see what you said in that the firewall is sending ARP requests
rather than replies.  I don't know what's going on with that, I'll ask the
security guys to take that up with the vendor.
Comment 22 Neil Horman 2006-02-06 09:40:12 EST
No, I don't really think thats a good idea.  Primarily I think that because
there is no way that this should ever work with the stock kernel.  The only way
that this will work without gratuitous arp support enabled is if some mechanism
is consistently clearing the respective arp entry in your cache, so that a
subsequent arp reply from your firewall in your test environment updates the
cache.  A few questions:

1) When you say stock kernel, are you referring to the kernel that I provided
you, or one of the kernels that origionally shipped as GA or an update to RHEL4?
 Sysreports from both your test host and the production host would be helpful here.

2) The tcpdump that you origionally included, I have been assuming that it was
taken from the host that was having trouble failing over the firewall, is that
true, or was it taken from another point on the network (the other side of the
router between the host and the firewall perhaps)?  A tcpdump of your test
environment would be helpful in comparison here.

3) There is one condition I can think of in which this may work, for which there
is a tunable that may describe your discrepancy.  If an arp table entry has not
been recently updated, but does exist, a reply will update the table, with
whatever mac address is in the arp frame.  The border between recent and not
recent is defined by the sysctl /proc/sys/net/ipv4/neigh/<interface>/locktime. 
If this value is small in the test system, but large in the production system,
this would explain the discrepancy.  Bear in mind though, merely reducing the
locktime is not a complete solution, as it doesn't not handle the case in which
an arp entry does not yet exist in the table (the true gratuitous arp case).  To
handle that you still need the updated kernel that I provided you.
Comment 23 Jon Stanley 2006-02-06 12:11:45 EST
Some answers to your questions:

1)  The stock kernel is 2.6.9-22.0.1 with patches for bug 167630 and bug 165018
applied to it and recompiled.  No other changes were made to the default.

2)  Yes, the tcpdumps originally provided were from the host that was directly
attached to the firewall (actually both sides were provided).  I've uploaded
both sides to enterprise.redhat.com from the test environment (test3.cap and
test3external.cap).  The external one is the one with bonding, the internal one
does not have bonding.  Just for reference, here's the addressing of the lab:

10.12.44.45 (external VIP of firewall)
10.12.55.45 (internal VIP of firewall)'
B01-SPLAT:

eth0: 10.12.44.41/24 (internal)
eth5: 10.12.55.41/24 (external)
eth6: 10.21.0.21/30 (state)

B02-SPLAT:

eth0: 10.12.44.42/24 (internal)
eth4: 10.12.55.42/24 (external)
eth5: 10.21.0.22/30 (state)

The hosts are 10.12.44.46 and 10.12.55.46.  Each have static routes to reach the
other via their respective gateways.

3)  That tunable was what I was thinking may have been causing the original
problem.  That number is set to 99 on both systems, which as far as I know is
measured in 1/100th of a second.  So it's about 1 second.  However we've seen
delays in the 20-30 second range.
Comment 24 Jon Stanley 2006-02-06 12:13:16 EST
Oh, and the sysreport from the test system is on enterprise.redhat.com as
jstanley.tar.bz2 as well.  Sorry I forgot that....
Comment 25 Neil Horman 2006-02-06 13:26:02 EST
thank you.  I'll look at that shortly.  If you could get me a tcpdump from the
test environment, I'd appreciate it.  Thanks.
Comment 26 Jon Stanley 2006-02-28 13:09:21 EST
Neil,

Sorry to not have gotten back to you in so long.  Have you had a chance to look
at the tcpdumps in comment #23 yet?
Comment 27 Neil Horman 2006-02-28 14:01:44 EST
sorry, no, I didn't see that update in time and they were auto-scrubbed from the
ftp server.  If you upload them again, I'll get them immediately, or you can
just mail them to me.  Just so that we're on the same page, these are traces
from the test environment, right?  the setup that you have working properly? 
I'd like to compare those to the first set that you mailed to John to see whats
different.  If that timing value isn't set inappropriately, I'm suspecting that
the difference will be the ARP_OP field in the working setup will be set to
ARP_REPLY while the failing setup is using ARP_REQUEST, but thats just a guess.
 Comparing the working and non working traces should lead us to some more
visibility into this problem.  Thanks
Comment 28 Jon Stanley 2006-02-28 14:17:35 EST
My bad.

Yes, the new sedt of captures is from the test environment that seems to be
working properly.  I just uploaded them again.  Sorry for the delay!
Comment 29 Neil Horman 2006-02-28 16:10:33 EST
I've got the traces thanks, and I'm looking at them.

One thing that I've noticed already is not the gratuitous arp requests in the
traces, but rather the unsolicited arp replies.  The traces from the production
network show Unsolicited arp replies (UAR's) at 30-60 second intervals, while
the test network shows them at 1 second intervals, which matches the arp cache
update time discrepancy we've been discussing here.  Since the bonding driver is
using arp_ip_target monitoring to update the bond state, it should treat these
unsolicited replies like regular replies.  I'm wondering perhaps if the heavy
use of arp_ip_target by several hosts isn't triggering some throttle mechanism
on your firewall to limit the number of arp replies it sends in an effort to
prevent a DOS attack on itself.  Perhaps the only reason your test environment
is working so well is because you don't have enough hosts running arp_ip_target
to trigger the throttle, or your test firewall isn't configured to throttle arp
replies at all.  Could you take a look and let me know if your firewall has any
such mechanism?
Comment 30 Neil Horman 2006-03-13 12:54:13 EST
I've got the gratuitous arp fix in upstream, so I'm going to propose it for RHEL
shortly, but I'd like to hear feedback on my comments above first please....
Comment 31 Neil Horman 2006-06-14 08:03:04 EDT
Created attachment 130829 [details]
patch to allow the acceptance of gratuitous arps

Given that the need to accept gratuitous arps is required for several
situations, and that the customer seems to have gone silent on this issue, I'm
going to post this internally, and move forward.  This is a backport of the
patch which has been accepted upstream
Comment 32 Neil Horman 2006-06-15 09:36:11 EDT
I'm afraid a reviewer of this patch has pointed out a rather glaring kabi
breakage in this patch.  While thats not a problem for upstream, it is a big
problem for RHEL releases, and any workaround for it would introduce rather
large hacks to our kernel which would make future upgrade work on this area of
code far more difficult.  As such I'm afraid the consensus on this bug has
become WONTFIX

Note You need to log in before you can comment on or make changes to this bug.