From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030922 Galeon/1.3.10 Description of problem: After setting up bonding over 2 1000MBit Broadcom cards using the tg3 driver and arp ping as failover monitor, the bonding driver doesnt initiate the failover after losing the link and/or turning the switch off. Using the same tg3 driver and the same hardware with miimon (link detection) its working. To get it worse: with the original broadcom drivers the same scenario is working with arp ping. For testing I used the bcm5700-7.1.9-1.src.rpm package from broadcom and built it on rhel3. These are open source drivers and published under the terms of the gpl. Version-Release number of selected component (if applicable): kernel-2.4.21-9.0.1.EL How reproducible: Always Steps to Reproduce: 1.set up bonding with two broadcom cards 2.use these options in /etc/modules.conf: alias eth0 bcm5700 alias eth1 bcm5700 alias bond0 bonding options bond0 mode=1 multicast=1 arp_interval=1000 arp_ip_target=192.168.48.10 primary=eth0 options bcm5700 line_speed=1000,1000 full_duplex=1,1 3.now try to break the link for eth0 Actual Results: no failover will happen Expected Results: a failover to eth1 should be initiated Additional info: while using the same bonding options with the original broadcom driver and the same hardware, everything works fine and as expected.
Is this related to BZ 116916?
Created attachment 102400 [details] bonding-update.patch Backport of latest bonding driver...
Created attachment 102401 [details] tg3-update.patch Backport of latest tg3 driver...
Above two patches (at least together) seem to avoid this problem. However, when using arp monitor the link does not work until after at least one link has been pulled. Still working on that one...
Hmmm...actually seems to be specifying "primary=" on the bonding options that causes the initial failure...
OK...please ignore previous two comments...I must have been doing something wrong. I think my configs didn't match at each end... Current status should return to "Above two patches (at least together) seem to avoid this problem."
Fixes for this problem have just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.4.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html