Bug 98464 - bonding 802.3ad long failover time under heavy stress
bonding 802.3ad long failover time under heavy stress
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jeff Garzik
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2003-07-02 14:04 EDT by Need Real Name
Modified: 2013-07-02 22:12 EDT (History)
3 users (show)

See Also:
Fixed In Version: 2.4.21-1.1931.2.349.2.2.ent
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-08-03 09:55:37 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2003-07-02 14:04:49 EDT
Description of problem:
When using bonding in 802.3ad mode with very heavy stress, removing the last 
slave of the active aggregator might result in long failover to another 
aggregator (upto 90 sec.) due to LACPDU packets being dropped from the slaves 
tx queue. The solution is to send such packets with the highest priority. The 
solution was verified to work with 10/100/1000Mbps adapters, but might not be 
good enough when using 10/100 adapter - in which case it will be necessary to 
wait the entire timeout defind by the IEEE standard.

Version-Release number of selected component (if applicable):

How reproducible:
Configure a bonding team in 802.3ad mode with 2 or more gig adapters and 2 or 
more 10/100 adapters. Start heavy bi-directional stress traffic between the 
server and the clients and remove the gig slaves from the bond one by one. Once 
the last gig slave is removed, traffic may stall until the new aggregator is 
selected (may vary between switches).

Steps to Reproduce:
1. insmod bonding mode=4
2. ifconfig bond0 <ip-addr>
3. ifenslave bond0 eth0 eth1 eth2 eth5 eth6 eth7
4. run stress traffic (e.g. iperf, netperf, etc.)
5. do ifenslave -c on all gig slaves
6. monitor traffic on remaining slaves
Actual results:
Traffic stalls for up to 90 sec. until a new aggregator is selected.

Expected results:
Traffic restarts immediately using another aggregator.

Additional info:
A bug fix patch was sent by me on June 26th to bond-devel, linux-net and linux-
netdev lists. It was already accepted by Jeff Garzik into his net-drivers-2.4 
BK tree. It is a further fix for the original problem reported by Jay Vosburgh 
regarding the same problem without stress traffic.
Comment 1 Larry Troan 2003-07-16 09:55:44 EDT
Comment 2 Rik van Riel 2003-07-16 09:59:11 EDT
Jeff, do we have patches for this in Taroon already or are they still in your
queue ?
Comment 3 Need Real Name 2003-07-23 13:14:01 EDT
Appears to be fix implemented in RHEL 3 B1 candidate kernel (version 2.4.21-

Note You need to log in before you can comment on or make changes to this bug.