Red Hat Bugzilla – Bug 118962
bonding failover with arp ping and tg3 driver isnt working
Last modified: 2007-11-30 17:07:00 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4)
Description of problem:
After setting up bonding over 2 1000MBit Broadcom cards using the tg3
driver and arp ping as failover monitor, the bonding driver doesnt
initiate the failover after losing the link and/or turning the switch
off. Using the same tg3 driver and the same hardware with miimon (link
detection) its working.
To get it worse: with the original broadcom drivers the same scenario
is working with arp ping. For testing I used the
bcm5700-7.1.9-1.src.rpm package from broadcom and built it on rhel3.
These are open source drivers and published under the terms of the gpl.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.set up bonding with two broadcom cards
2.use these options in /etc/modules.conf:
alias eth0 bcm5700
alias eth1 bcm5700
alias bond0 bonding
options bond0 mode=1 multicast=1 arp_interval=1000
options bcm5700 line_speed=1000,1000 full_duplex=1,1
3.now try to break the link for eth0
Actual Results: no failover will happen
Expected Results: a failover to eth1 should be initiated
while using the same bonding options with the original broadcom driver
and the same hardware, everything works fine and as expected.
Is this related to BZ 116916?
Created attachment 102400 [details]
Backport of latest bonding driver...
Created attachment 102401 [details]
Backport of latest tg3 driver...
Above two patches (at least together) seem to avoid this problem.
However, when using arp monitor the link does not work until after at
least one link has been pulled. Still working on that one...
Hmmm...actually seems to be specifying "primary=" on the bonding
options that causes the initial failure...
OK...please ignore previous two comments...I must have been doing
something wrong. I think my configs didn't match at each end...
Current status should return to "Above two patches (at least together)
seem to avoid this problem."
Fixes for this problem have just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.4.EL).
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.