Description of problem: I have Dell PE2850 (2xcpu,8GB ram) with 4 gigabit cards (e1000) attached to single Cisco 4506. I try to create bonding interfaces with 2 slaves. I try with balance-alb, but similar behaviour is also with balance-tlb and balance-rr. i do not have any special switch settings. I am able to receive 2x 1Gbps streams on bond0 (balance-alb) from two other hosts. I am able to send 1Gbps stream to one other host via bond0. I am able to send 2x 1Gbps streams to host1 via bond0 and host3 via bond1. I am only able to send 2x 0.5Gbps streams to host1 and host2 via bond0 (both slaves are used, each having traffic to single host, but only 0.5Gbps). Is there any reason why the outgoing traffic via bond0 doesn't go over 1Gbps? The host utilization is well under its limits. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-42.0.3.EL How reproducible: allways Steps to Reproduce: 1. create bond0 with 2 e1000 gigabit slaves, balance-alb, mii monitoring 2. generate network traffic to one directly connected host. Watch network traffic on bond0 and on both its slaves (I observe 1Gbps on bond0 and eth0, 0Gbps on eth1) 3. generate network traffic to another host via bond0. Watch the traffic on bond0 and it's slaves (I observer 1Gbps on bond0 and 0.5Gbps on each eth0 and eth1) Actual results: Total traffic doesn't go over 1Gbps Expected results: Total traffic should be 2Gbps, with 1Gbps to each client because each host uses different slave interface) Additional info:
Do you have any idea why is this happening? Are you able to reproduce this behaviour? Or could this be sort of configuration problem only?
> I am able to receive 2x 1Gbps streams on bond0 (balance-alb) from two other hosts. This is working as expected for balance-alb, but rr mode would be limited to 1Gbps since the same MAC/IP combo would be used on all systems. > I am able to send 1Gbps stream to one other host via bond0. This is working as expected. > I am able to send 2x 1Gbps streams to host1 via bond0 and host3 via bond1. This is working as expected. > I am only able to send 2x 0.5Gbps streams to host1 and host2 via bond0 (both > slaves are used, each having traffic to single host, but only 0.5Gbps). Are you using tcp or udp? Does this change when switching to rr-mode and udp? > Is there any reason why the outgoing traffic via bond0 doesn't go over 1Gbps? > The host utilization is well under its limits. There are not any hard limitations in the driver that cap speeds at 1G. There are definitely some limitations one bonding and how much you can transmit and receive from a single host -- generally the limitation is on reception since the switch can't learn the destination MAC on multiple interfaces and stripe the traffic across them. This limitation is lifted on 802.3ad, xor, and balance-alb since the switch can hash different connections over different interfaces, but each tcp/udp stream (and in alb's case host) will still be limited to the speed of a slave interface.
I'm using TCP (with alb,tlb and rr). Actually I have 2 identical setups on two PE2850. My modprobe.conf on host0: alias bond0 bonding alias bond1 bonding options bond0 miimon=50 mode=balance-alb max_bonds=2 options bond1 miimon=50 mode=balance-alb and on host1: install bond0 modprobe e1000; modprobe bonding --ignore-install -o bond0 \ mode=balance-alb miimon=50 primary=eth2 install bond1 modprobe e1000; modprobe bonding --ignore-install -o bond1 \ mode=balance-alb miimon=50 They are little bit different because I did some testing there (assitgning different physical interfaces to bonds, parameters for e1000, etc.) On 2.6.9-42.0.3.EL both behaved the same. Yesterday I upgraded to 2.6.9-42.0.8.ELsmp and host1 still behaves the same (config above), while host0 now behaves correctly, with 2 outgoing connections I get over 1.8Gbps. The host1 is now using 2 onboard e1000 for bond0 (which I test), host0 uses one onboard and one dual-port card in PCI-X slot (eth0+eth2). Bu I do not think that this is a problem, because I am(was) able to send at full speed with any two cards out of my four. I do not know when I'll be able to switch the config on host1 to see whether different modprobe.conf will help (machines isused by other people too). Is the configuration of host1 wrong? Actually this is the only way how to have two different bonding algorithms on bond0 and bond1, AFAIK.
Can you try to use netcat (nc) and use udp traffic? One problem with tcp is that you often don't know if the limitation is on rx or tx since tcp will make the traffic back-off when the maximum throughput can't be reached. Glancing at your config it looks fine, though you should probably remove 'modprobe e1000;' from the 'install' line on host1. Could you also send the output of /proc/net/bonding/bond0 and /proc/net/bonding/bond1 on these systems as well?
I am now testing woth both udp and tcp, same results. I upgraded bios on both PE2850s to be the same (A06), no change. I replugged caples on host1 (not working) to have the same assignements to bonds as on host0 (working), no change (they were different becaouse of my previous testing of this issue). I change /etc/modprobe.conf to be the same as on host0, loading only one "bonding" with max_bonds=2, no change. i am now confused because I have two very much same configurations and one is working as I expect, while the other one is not. Here is the output of /proc on (working) host0 [root@paris ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 50 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:04:23:d8:30:4a Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:13:72:54:99:81 [root@paris ~]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth1 MII Status: up MII Polling Interval (ms): 50 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:04:23:d8:30:4b Slave Interface: eth3 MII Status: down Link Failure Count: 0 Permanent HW addr: 00:13:72:54:99:82 [root@paris ~]# And on host1 (not working): [root@sofia ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 50 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:04:23:d8:2c:3a Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:11:43:d4:94:a2 [root@sofia ~]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth1 MII Status: up MII Polling Interval (ms): 50 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:04:23:d8:2c:3b Slave Interface: eth3 MII Status: down Link Failure Count: 0 Permanent HW addr: 00:11:43:d4:94:a3 [root@sofia ~]# i'm testing on bond0, of course. The modprobe.conf on both nodes contains: alias eth0 e1000 alias eth1 e1000 alias eth2 e1000 alias eth3 e1000 options e1000 FlowControl=1 alias bond0 bonding alias bond1 bonding options bond0 miimon=50 mode=balance-alb max_bonds=2 options bond1 miimon=50 mode=balance-alb kernel on both nodes is 2.6.9-42.0.8.ELsmp eth0 and eth1 are on PCI-X dual-port network card, eth2 and eth3 are onboard. None of these two hosts is overloaded when I do the tests. sysctl.conf is the same on both nodes.
It seems the problem is not RH related, but some limitation on Cisco Catalys 4506. If I plug the cables to Catalyst ports which are not close enough (different blocks of 8 ports, as labeled on the Cisco board), I can get 2x 1Gbps on both machines. Please close this as not-a-bug (at least for RedHat:)
Thanks for the update, David. I'll close this one out, but I'll remember that switches can cause problems sometimes too! :)