Bug 278641 - 802.3ad bonding does not work as expected on xmit
802.3ad bonding does not work as expected on xmit
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
medium Severity low
: ---
: ---
Assigned To: Andy Gospodarek
Martin Jenner
Depends On:
  Show dependency treegraph
Reported: 2007-09-05 11:36 EDT by Chuck Mead
Modified: 2014-06-29 18:59 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-09-14 16:01:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Chuck Mead 2007-09-05 11:36:09 EDT
Description of problem: Bonded network interface using 802.3ad does not balance
outbound traffic to the local network.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create /etc/modprobe.conf as follows:

alias eth0 tg3
alias eth1 tg3
alias scsi_hostadapter cciss
alias eth2 e1000
alias eth3 e1000
alias eth4 e1000
alias eth5 e1000
alias usb-controller ohci-hcd
alias bond0 bonding 
options bond0 mode=4 miimon=100

2.  Create /etc/sysconfig/network-scripts/ifcfg-eth0 as follows:
ETHTOOL_OPTS="speed 1000 duplex full autoneg off"

3.  Create /etc/sysconfig/network-scripts/ifcfg-eth1 as follows:
ETHTOOL_OPTS="speed 1000 duplex full autoneg off"

4. Create /etc/sysconfig/network-scripts/ifcfg-bond0 as follows:

Actual results: Once put into service the network devices receive traffic in a
"balanced way" but xmit all seems to go out the etho interface. 

Expected results: After reading the
/usr/share/doc/kernel-doc-2.6.9/Documentation/networking/bonding.txt file with
attention paid to the 802.3ad section numbered 13.1 we verified that the machine
was talking to hosts on the local segment and not through a gateway. Based on
the documentation we expected that outbound traffic would have been balanced
instead of all going out through eth0.

Additional info:
Comment 1 Andy Gospodarek 2007-09-13 10:42:36 EDT
This is interesting.  Can you post the contents of /proc/net/bonding/bond0?
Comment 2 Chuck Mead 2007-09-13 10:55:50 EDT
nymsgs21 # cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v2.6.3-rh (June 8, 2005)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 2
        Actor Key: 17
        Partner Key: 3
        Partner Mac Address: 00:19:07:a0:8c:00

Slave Interface: eth0
MII Status: up
Link Failure Count: 3
Permanent HW addr: 00:17:a4:a7:95:bc
Aggregator ID: 2

Slave Interface: eth1
MII Status: up
Link Failure Count: 4
Permanent HW addr: 00:17:a4:a7:95:bb
Aggregator ID: 2
Comment 3 Andy Gospodarek 2007-09-13 11:18:14 EDT
Well that looks fine.

One thing you can try is the 'xmit_hash_policy' module parameter.  RHEL4.5
should support options of '0' or 'layer2' for mac-based hashing and '1' or
'layer3+4' for hashing based on host and tcp/udp port info.
Comment 4 Chuck Mead 2007-09-13 11:46:11 EDT
We are concerned about the fact that this will make the setup non-compliant with
the 802.3ad spec (according to the docs). The applications are sensitive to
packet retransmissions which is what this is likely to cause. Notably the
external config on this network is Cisco switched setup to support 802.3ad.
Comment 5 Andy Gospodarek 2007-09-13 12:05:23 EDT
How is choosing a different transmit hashing algorithm going to make it
non-standards compliant?  Please attach or provide a link to the document that
claims anything but MAC-based hashing isn't going to comply fully with the
standard.  Many switch manufacturers use hashing based on L2/L4/L4 info and I
believe some of them even allow the type of hashing to be configured.

This form of hashing should not cause retransmissions due to out of order
packets because each session (tcp/udp) will always hash to the same outgoing
interface.  The advantage to the layer3+4 hashing is that if the first tcp
session to a given host chooses eth0, the next tcp session to the exact same
host has the chance to pick eth1 for its traffic.
Comment 6 Andy Gospodarek 2007-09-13 12:07:45 EDT
An example of cisco allowing you to choose the hashing algorithm.


Comment 7 Chuck Mead 2007-09-13 13:57:19 EDT
per the doc: 


IEEE 802.3ad Dynamic link aggregation...
Slave selection for outgoing traffic is done according to the transmit hash 

Note that not all transmit policies may be 802.3ad compliant, particularly in
regards to the packet mis-ordering requirements of section 43.2.4 of the 
802.3ad standard.  Differing peer implementations will have varying tolerances 
for noncompliance.
This algorithm is not fully 802.3ad compliant. A single TCP or UDP conversation
containing both fragmented and unfragmented packets will see packets striped
across two interfaces.  This may result in out of order delivery.  Most traffic
types will not meet this criteria, as TCP rarely fragments traffic, and most UDP
traffic is not involved in extended conversations.  Other implementations of
802.3ad may or may not tolerate this noncompliance.
Comment 8 Andy Gospodarek 2007-09-13 15:16:55 EDT
I think the statements referenced in comment #7 are overblown, but if you are
concerned we can address why layer2 hashing doesn't work.

So you say that all the traffic always flows on eth0.  Do you have locally
administered MAC addresses on this network?  I ask because the layer2 hashing is
pretty simple:

static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
                                   struct net_device *bond_dev, int count)
        struct ethhdr *data = (struct ethhdr *)skb->data;

        return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;

So if you've got preset MACs on that network where the last 2 bytes on all of
them are the same then they will always XOR to the same value (since count is
simply the number of members in the bond).
Comment 9 Chuck Mead 2007-09-14 10:51:45 EDT
Okay.... upon further review it looks like almost all of the traffic is coming
from the switched gateway... if that's the case the MAC issue would definitely
come into play... if so then we can almost certainly mark this "Not A BUG"?
Comment 10 Andy Gospodarek 2007-09-14 16:01:24 EDT
Sounds good.  Feel free to re-open if you have any more problems.

Note You need to log in before you can comment on or make changes to this bug.