Bug 118962 - bonding failover with arp ping and tg3 driver isnt working
Summary: bonding failover with arp ping and tg3 driver isnt working
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2004-03-23 09:13 UTC by Niels Happel
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Clone Of:
Last Closed: 2004-12-20 20:55:00 UTC

Attachments (Terms of Use)
bonding-update.patch (307.52 KB, patch)
2004-08-03 19:27 UTC, John W. Linville
no flags Details | Diff
tg3-update.patch (32.66 KB, patch)
2004-08-03 19:27 UTC, John W. Linville
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:550 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4 2004-12-20 05:00:00 UTC

Description Niels Happel 2004-03-23 09:13:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4)
Gecko/20030922 Galeon/1.3.10

Description of problem:
After setting up bonding over 2 1000MBit Broadcom cards using the tg3
driver and arp ping as failover monitor, the bonding driver doesnt
initiate the failover after losing the link and/or turning the switch
off. Using the same tg3 driver and the same hardware with miimon (link
detection) its working.
To get it worse: with the original broadcom drivers the same scenario
is working with arp ping. For testing I used the
bcm5700-7.1.9-1.src.rpm package from broadcom and built it on rhel3.
These are open source drivers and published under the terms of the gpl.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.set up bonding with two broadcom cards
2.use these options in /etc/modules.conf:
alias eth0 bcm5700
alias eth1 bcm5700
alias bond0 bonding
options bond0 mode=1 multicast=1 arp_interval=1000
arp_ip_target= primary=eth0
options bcm5700 line_speed=1000,1000 full_duplex=1,1
3.now try to break the link for eth0

Actual Results:  no failover will happen

Expected Results:  a failover to eth1 should be initiated

Additional info:

while using the same bonding options with the original broadcom driver
and the same hardware, everything works fine and as expected.

Comment 1 Don Howard 2004-03-31 18:32:33 UTC
Is this related to BZ 116916? 

Comment 2 John W. Linville 2004-08-03 19:27:09 UTC
Created attachment 102400 [details]

Backport of latest bonding driver...

Comment 3 John W. Linville 2004-08-03 19:27:51 UTC
Created attachment 102401 [details]

Backport of latest tg3 driver...

Comment 4 John W. Linville 2004-08-03 19:30:11 UTC
Above two patches (at least together) seem to avoid this problem. 
However, when using arp monitor the link does not work until after at
least one link has been pulled.  Still working on that one...

Comment 5 John W. Linville 2004-08-03 19:44:56 UTC
Hmmm...actually seems to be specifying "primary=" on the bonding
options that causes the initial failure...

Comment 6 John W. Linville 2004-08-04 18:22:13 UTC
OK...please ignore previous two comments...I must have been doing
something wrong.  I think my configs didn't match at each end...

Current status should return to "Above two patches (at least together)
seem to avoid this problem."

Comment 7 Ernie Petrides 2004-09-10 00:57:28 UTC
Fixes for this problem have just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.4.EL).

Comment 8 John Flanagan 2004-12-20 20:55:00 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.