Description of problem: Since I updated to kernel 2.6.18-164.el5 (from CentOS), my bond interfaces in mode 6 (adaptive load balancing) don't work anymore (excepted bond0 which works fine !): /proc/net/bonding/bond{1,2} have the line "Currently Active Slave: none" despite having two physical ifaces up. It worked fine on previous kernel releases (128 ...) (bug originally opened on CentOS bugtracker, http://bugs.centos.org/view.php?id=3848 ) Version-Release number of selected component (if applicable): 2.6.18-164.el5 How reproducible: Steps to Reproduce: 1. Setup several bond interfaces, set them *all* to mode 6 (to ensure not being bitten by the bug 524206 :) 2. Reboot 3. cat /proc/net/bonding/bond1 or ping your gateways on bond1 (and following) Actual results: cat /proc/net/bonding/bond1 Currently Active Slave: none Expected results: cat /proc/net/bonding/bond1 Currently Active Slave: eth2 Additional info: modprobe.conf: options ipv6 "disable=1" alias net-pf-10 off alias ipv6 off alias eth0 bnx2x alias eth1 bnx2x alias eth2 bnx2x alias eth3 bnx2x alias eth4 bnx2x alias eth5 bnx2x alias eth6 bnx2x alias eth7 bnx2x alias scsi_hostadapter cciss alias bond0 bonding options bond0 mode=6 miimon=80 alias bond1 bonding options bond1 mode=6 miimon=80 alias bond2 bonding options bond2 mode=6 miimon=80 dmesg: IPv6: Loaded, but administratively disabled, reboot required to enable Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008) bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch bonding: MII link monitoring set to 80 ms bonding: bond0: Adding slave eth0. bnx2x: eth0: using MSI-X IRQs: sp 146 fp 154 - 178 bonding: bond0: enslaving eth0 as an active interface with a down link. bnx2x: eth0 NIC Link is Down bonding: bond0: Adding slave eth1. bnx2x: eth1: using MSI-X IRQs: sp 186 fp 194 - 218 bnx2x: eth0 NIC Link is Down bnx2x: eth0 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: enslaving eth1 as an active interface with a down link. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bnx2x: eth1 NIC Link is Down bonding: bond0: first active interface up! bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth1, disabling it device eth0 entered promiscuous mode type=1700 audit(1253260923.120:3): dev=eth0 prom=256 old_prom=0 auid=4294967295 ses=4294967295 bnx2x: eth1 NIC Link is Down bnx2x: eth1 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth1. bonding: bond1 is being created... bonding: bond1: Adding slave eth2. bnx2x: eth2: using MSI-X IRQs: sp 226 fp 234 - 67 bnx2x: eth2 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to be 100Mb/sec and Full. bonding: bond1: enslaving eth2 as an active interface with an up link. bonding: bond1: Adding slave eth3. bnx2x: eth3: using MSI-X IRQs: sp 75 fp 83 - 107 bnx2x: eth3 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond1: Warning: failed to get speed and duplex from eth3, assumed to be 100Mb/sec and Full. bonding: bond1: enslaving eth3 as an active interface with an up link. bonding: bond2 is being created... bonding: bond2: Adding slave eth4. bnx2x: eth4: using MSI-X IRQs: sp 115 fp 123 - 147 bnx2x: eth4 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond2: Warning: failed to get speed and duplex from eth4, assumed to be 100Mb/sec and Full. bonding: bond2: enslaving eth4 as an active interface with an up link. bonding: bond2: Adding slave eth5. bnx2x: eth5: using MSI-X IRQs: sp 155 fp 163 - 187 bnx2x: eth5 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond2: Warning: failed to get speed and duplex from eth5, assumed to be 100Mb/sec and Full. bonding: bond2: enslaving eth5 as an active interface with an up link. device eth0 left promiscuous mode type=1700 audit(1253260933.040:4): dev=eth0 prom=0 old_prom=256 auid=4294967295 ses=4294967295 # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 80 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:23:7d:f1:7c:c0 Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:23:7d:f1:7c:c4 # cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: None MII Status: up MII Polling Interval (ms): 80 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:23:7d:f1:7c:c1 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:23:7d:f1:7c:c5 # cat /proc/net/bonding/bond2 Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: None MII Status: up MII Polling Interval (ms): 80 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth4 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:23:7d:f1:7c:c2 Slave Interface: eth5 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:23:7d:f1:7c:c6 # ethtool -i eth0 driver: bnx2x version: 1.48.105 firmware-version: BC:4.8.0 PHY:baa0:0105 bus-info: 0000:02:00.0 # ethtool -i eth1 driver: bnx2x version: 1.48.105 firmware-version: BC:4.8.0 PHY:baa0:0105 bus-info: 0000:02:00.1 # ethtool -i eth2 driver: bnx2x version: 1.48.105 firmware-version: BC:4.8.0 bus-info: 0000:02:00.2 # ethtool -i eth3 driver: bnx2x version: 1.48.105 firmware-version: BC:4.8.0 bus-info: 0000:02:00.3 (same for eth4/eth5) # ethtool eth0 Settings for eth0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 2500baseX/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 2500baseX/Full 10000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: FIBRE PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Current message level: 0x00000000 (0) Link detected: yes # ethtool eth1 Settings for eth1: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 2500baseX/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 2500baseX/Full 10000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: FIBRE PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Current message level: 0x00000000 (0) Link detected: yes # ethtool eth2 Settings for eth2: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 2500baseX/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 2500baseX/Full 10000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! (4500) Duplex: Full Port: FIBRE PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x00000000 (0) Link detected: yes # ethtool eth3 Settings for eth3: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 2500baseX/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 2500baseX/Full 10000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! (4500) Duplex: Full Port: FIBRE PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x00000000 (0) Link detected: yes (same for eth4/eth5) # ifenslave -c bond1 eth2 Master 'bond1', Slave 'eth2': Error: Change active failed # strace ... ioctl(3, SIOCETHTOOL, 0x7fff5b3cd360) = 0 ioctl(3, SIOCGIFMTU, {ifr_name="bond1", ifr_mtu=1500}) = 0 ioctl(3, SIOCGIFFLAGS, {ifr_name="bond1", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MASTER|IFF_MULTICAST}) = 0 ioctl(3, SIOCGIFHWADDR, {ifr_name="bond1", ifr_hwaddr=00:23:7d:f1:7c:c1}) = 0 ioctl(3, SIOCGIFFLAGS, {ifr_name="eth2", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_SLAVE|IFF_MULTICAST}) = 0 ioctl(3, SIOCBONDCHANGEACTIVE, 0x7fff5b3cd330) = -1 EINVAL (Invalid argument) ioctl(3, 0x89fd, 0x7fff5b3cd330) = -1 EINVAL (Invalid argument) write(2, "Master 'bond1', Slave 'eth2': Er"..., 58Master 'bond1', Slave 'eth2': Error: Change active failed ) = 58 close(3) = 0
- mode 5 (and 6) don't work, mode 0 and 1 do. - bond0 always works, whatever the mode. - fully enabling ipv6 doesn't help for modes 5 and 6.
Looks like the same problem as http://lkml.org/lkml/2009/3/15/114
Have you moved your bonding configuration out of /etc/modprobe.conf and into a section called BONDING_OPTS in /etc/sysconfig/network-scripts/ifcfg-bond1? I suspect that is the problem. The RHEL4 and FC<7 way of configuring bonding is no longer supported and we suggest using the bonding module options as: BONDING_OPTS="mode=6 miimon=80" in ifcfg-bondX files. I will leave this bug open as I'm not sure this is exactly the problem, but please test based on my suggestions and report whether this bug can be closed. Thanks!
Hi Andy, I've tested using BONDING_OPTS but it gives the same result. Strange things is that bond0 works fine, thus implying a problem elsewhere (our network ?) for bond1 and bond2. But then again, all ifaces were working fine with kernel 2.6.18-7.1 using mode 5 or mode 6.
This is an interesting one, David. These messages seemed a bit odd to me: bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to be 100Mb/sec and Full. until I looked at your ethtool output: # ethtool eth2 Settings for eth2: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 2500baseX/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 2500baseX/Full 10000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! (4500) Duplex: Full Port: FIBRE PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x00000000 (0) Link detected: yes The speed of 4500 is a bit odd, but I'm quite sure this is done when using a multi-function bnx2x-based device and ethtool will report max bandwidth available. Are you using a Broadcom 57711? The initialization of the bond1 and bond2 also looks different from bond0, and I suspect this comes down to a difference in the way link speed and duplex is detected on what I think are these multi-funtion 57711 devices. Also, the patch in comment #2 does appear in the latest RHEL5 kernels, so I don't suspect needing that is a problem. Is there any way you could take down bond1 switch the configuration for bond1 to active-backup, bring it back up and paste the logs here?
57711 indeed, it's a HP BL460c G6 blade in c7000 enclosure, with Flex-10 VirtualConnect modules: 1 Gb for bond0 network, 4.5 for bond1 and 4.5 for bond2. lspci -v for eth2: 02:00.3 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe Subsystem: Hewlett-Packard Company NC532i Dual Port 10GbE Multifunction BL-C Adapter Flags: bus master, fast devsel, latency 0, IRQ 138 Memory at f8000000 (64-bit, non-prefetchable) [size=8M] Memory at f7800000 (64-bit, non-prefetchable) [size=8M] Capabilities: <access denied> dmesg with bond1 in active-backup mode: bonding: bond1 is being created... bonding: bond1: setting mode to active-backup (1). bonding: bond1: Setting MII monitoring interval to 80. bonding: bond1: Adding slave eth2. bnx2x: eth2: using MSI-X IRQs: sp 69 fp 70 - 77 bnx2x: eth2 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to be 100Mb/sec and Full. bonding: bond1: making interface eth2 the new active one. bonding: bond1: first active interface up! bonding: bond1: enslaving eth2 as an active interface with an up link. bonding: bond1: Adding slave eth3. bnx2x: eth3: using MSI-X IRQs: sp 78 fp 83 - 86 bnx2x: eth3 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond1: Warning: failed to get speed and duplex from eth3, assumed to be 100Mb/sec and Full. bonding: bond1: enslaving eth3 as a backup interface with an up link.
Capabilities for the incomplete lspci output above: Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Capabilities: [a0] MSI-X: Enable+ Mask- TabSize=17 Capabilities: [ac] Express Endpoint IRQ 0 Capabilities: [100] Device Serial Number c4-7c-f1-fe-ff-7d-23-00 Capabilities: [110] Advanced Error Reporting Capabilities: [150] Power Budgeting Capabilities: [160] Virtual Channel
David, I was correct when I mentioned that the patch in comment #2 was in the latest RHEL5 kernels, but 2.6.18-164 is not the latest RHEL5 development kernel. That patch was added in 2.6.18-165. Would you mind building -164 with the patch from comment #2 or testing with the latest development kernel from: http://people.redhat.com/dzickus/el5/ Those are obviously untested and unsupported, but it would probably a quick way to see if that patch resolves the issue (which I suspect it will).
I somehow tried to rebuild the 164 with the patch a few weeks ago but was lost while trying to produce a proper patch for 164 with the myriad of patches already applied to vanilla :) (including full bonding driver updates) Anyway, I just tested with http://people.redhat.com/dzickus/el5/165.el5/x86_64/kernel-2.6.18-165.el5.x86_64.rpm and it works fine ! Here's the dmesg for bond1 and bond2: bonding: bond1 is being created... bonding: bond1: setting mode to balance-tlb (5). bonding: bond1: Setting MII monitoring interval to 80. bonding: bond1: Adding slave eth2. bnx2x: eth2: using MSI-X IRQs: sp 226 fp 234 - 67 bnx2x: eth2 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to be 100Mb/sec and Full. bonding: bond1: making interface eth2 the new active one. bonding: bond1: first active interface up! bonding: bond1: enslaving eth2 as an active interface with an up link. bonding: bond1: Adding slave eth3. bnx2x: eth3: using MSI-X IRQs: sp 75 fp 83 - 107 bnx2x: eth3 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond1: Warning: failed to get speed and duplex from eth3, assumed to be 100Mb/sec and Full. bonding: bond1: enslaving eth3 as an active interface with an up link. bonding: bond2 is being created... bonding: bond2: setting mode to balance-alb (6). bonding: bond2: Setting MII monitoring interval to 80. bonding: bond2: Adding slave eth4. bnx2x: eth4: using MSI-X IRQs: sp 115 fp 123 - 147 bnx2x: eth4 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond2: Warning: failed to get speed and duplex from eth4, assumed to be 100Mb/sec and Full. bonding: bond2: making interface eth4 the new active one. bonding: bond2: first active interface up! bonding: bond2: enslaving eth4 as an active interface with an up link. bonding: bond2: Adding slave eth5. bnx2x: eth5: using MSI-X IRQs: sp 155 fp 163 - 187 bnx2x: eth5 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON bonding: bond2: Warning: failed to get speed and duplex from eth5, assumed to be 100Mb/sec and Full. bonding: bond2: enslaving eth5 as an active interface with an up link. Thanks for your help !
Excellent! Thanks for the quick testing and feedback. I would like to close this, but there needs to be some re-work in the bonding code to deal with different speeds that are being provided by the virtual function drivers. I would rather that bonding doesn't print that ugly message that it is presuming that the link is 100/FD. That will be a problem for some modes that consider speed of the link for output port selection. I'll have to look-up some more specs on that chassis to see if I can get a feel for what speeds are offered. It might make a difference for the changes that I make for bonding.
I'm going to go ahead and close this out since bonding is working after adding the patch added to resolve bug 499884. *** This bug has been marked as a duplicate of bug 499884 ***