Bug 98462

Summary: bonding TLB load sharing fails under heavy UDP Tx stress
Product: Red Hat Enterprise Linux 3 Reporter: Need Real Name <shmulik.hen>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: peterm, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
URL: http://sourceforge.net/projects/bonding/
Whiteboard:
Fixed In Version: 2.4.21-1.1931.2.349.2.2.ent Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-08-03 13:54:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2003-07-02 17:46:34 UTC
Description of problem:
When running very heavy UDP Tx stress traffic with 10/100 adapters, load-
sharing collapses to only one slave after a few seconds. Caused by a 
unsigned/signed cast error in the TLB code.

Version-Release number of selected component (if applicable):
kernel-2.4.20-1.1931.2.231.2.11.ent

How reproducible:
Configure a bond team with only 10/100 adapters and run very heavy UDP Tx 
stress traffic to many clients. Monitor Tx/Rx activity of the slaves.

Steps to Reproduce:
1. insmod bonding mode=5
2. ifconfig bond0 <ip-addr>
3. ifenslave bond0 eth0 eth1 eth2
4. start stress application (e.g iperf, netperf, etc.)
    
Actual results:
After a few seconds only one slave takes part of load sharing while others stay 
idle. Traffic may pass from slave to slave at 10 sec. intervals (re-balance 
timeout).

Expected results:
All slaves continuously take part of the load sharing.

Additional info:
A bug fix patch was sent by me on June 26th to bond-devel, linux-net and linux-
netdev lists. It was already accepted by Jeff Garzik into his net-drivers-2.4 
BK tree.

Comment 1 Larry Troan 2003-07-16 13:51:59 UTC
ISSUE TRACKER 25886 opened as sev 1

Comment 2 Rik van Riel 2003-07-16 13:57:07 UTC
Jeff, does Taroon already have the patch for this or is it still in your queue ?

Comment 3 Need Real Name 2003-07-23 17:11:08 UTC
Appears to be fix implemented in RHEL 3 B1 candidate kernel (version 2.4.21-
1.1931.2.349.2.2.ent).