Bug 118962 - bonding failover with arp ping and tg3 driver isnt working
bonding failover with arp ping and tg3 driver isnt working
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
high Severity medium
: ---
: ---
Assigned To: John W. Linville
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2004-03-23 04:13 EST by Niels Happel
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-12-20 15:55:00 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
bonding-update.patch (307.52 KB, patch)
2004-08-03 15:27 EDT, John W. Linville
no flags Details | Diff
tg3-update.patch (32.66 KB, patch)
2004-08-03 15:27 EDT, John W. Linville
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:550 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4 2004-12-20 00:00:00 EST

  None (edit)
Description Niels Happel 2004-03-23 04:13:33 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4)
Gecko/20030922 Galeon/1.3.10

Description of problem:
After setting up bonding over 2 1000MBit Broadcom cards using the tg3
driver and arp ping as failover monitor, the bonding driver doesnt
initiate the failover after losing the link and/or turning the switch
off. Using the same tg3 driver and the same hardware with miimon (link
detection) its working.
To get it worse: with the original broadcom drivers the same scenario
is working with arp ping. For testing I used the
bcm5700-7.1.9-1.src.rpm package from broadcom and built it on rhel3.
These are open source drivers and published under the terms of the gpl.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.set up bonding with two broadcom cards
2.use these options in /etc/modules.conf:
alias eth0 bcm5700
alias eth1 bcm5700
alias bond0 bonding
options bond0 mode=1 multicast=1 arp_interval=1000
arp_ip_target= primary=eth0
options bcm5700 line_speed=1000,1000 full_duplex=1,1
3.now try to break the link for eth0

Actual Results:  no failover will happen

Expected Results:  a failover to eth1 should be initiated

Additional info:

while using the same bonding options with the original broadcom driver
and the same hardware, everything works fine and as expected.
Comment 1 Don Howard 2004-03-31 13:32:33 EST
Is this related to BZ 116916? 
Comment 2 John W. Linville 2004-08-03 15:27:09 EDT
Created attachment 102400 [details]

Backport of latest bonding driver...
Comment 3 John W. Linville 2004-08-03 15:27:51 EDT
Created attachment 102401 [details]

Backport of latest tg3 driver...
Comment 4 John W. Linville 2004-08-03 15:30:11 EDT
Above two patches (at least together) seem to avoid this problem. 
However, when using arp monitor the link does not work until after at
least one link has been pulled.  Still working on that one...
Comment 5 John W. Linville 2004-08-03 15:44:56 EDT
Hmmm...actually seems to be specifying "primary=" on the bonding
options that causes the initial failure...
Comment 6 John W. Linville 2004-08-04 14:22:13 EDT
OK...please ignore previous two comments...I must have been doing
something wrong.  I think my configs didn't match at each end...

Current status should return to "Above two patches (at least together)
seem to avoid this problem."
Comment 7 Ernie Petrides 2004-09-09 20:57:28 EDT
Fixes for this problem have just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.4.EL).
Comment 8 John Flanagan 2004-12-20 15:55:00 EST
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.