Bug 524233 - bonding mode 6 not working since upgrade to kernel r164
Summary: bonding mode 6 not working since upgrade to kernel r164
Keywords:
Status: CLOSED DUPLICATE of bug 499884
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Andy Gospodarek
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-18 13:36 UTC by David111
Modified: 2014-06-29 23:01 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-11-30 16:28:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
CentOS 3848 0 None None None Never

Description David111 2009-09-18 13:36:00 UTC
Description of problem:
 Since I updated to kernel 2.6.18-164.el5 (from CentOS), my bond interfaces in mode 6 (adaptive load balancing) don't work anymore (excepted bond0 which works fine !): /proc/net/bonding/bond{1,2} have the line "Currently Active Slave: none" despite having two physical ifaces up.

 It worked fine on previous kernel releases (128 ...)

 (bug originally opened on CentOS bugtracker, http://bugs.centos.org/view.php?id=3848 )

Version-Release number of selected component (if applicable):
 2.6.18-164.el5

How reproducible:


Steps to Reproduce:
1. Setup several bond interfaces, set them *all* to mode 6 (to ensure not being bitten by the bug 524206 :)
2. Reboot
3. cat /proc/net/bonding/bond1 or ping your gateways on bond1 (and following)
  
Actual results:
 cat /proc/net/bonding/bond1 
 Currently Active Slave: none

Expected results:
 cat /proc/net/bonding/bond1 
 Currently Active Slave: eth2

Additional info:

modprobe.conf:

options ipv6 "disable=1"
alias net-pf-10 off
alias ipv6 off

alias eth0 bnx2x
alias eth1 bnx2x
alias eth2 bnx2x
alias eth3 bnx2x
alias eth4 bnx2x
alias eth5 bnx2x
alias eth6 bnx2x
alias eth7 bnx2x
alias scsi_hostadapter cciss

alias bond0 bonding
options bond0 mode=6 miimon=80
alias bond1 bonding
options bond1 mode=6 miimon=80
alias bond2 bonding
options bond2 mode=6 miimon=80

dmesg:

IPv6: Loaded, but administratively disabled, reboot required to enable
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)
bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch
bonding: MII link monitoring set to 80 ms
bonding: bond0: Adding slave eth0.
bnx2x: eth0: using MSI-X  IRQs: sp 146  fp 154 - 178
bonding: bond0: enslaving eth0 as an active interface with a down link.
bnx2x: eth0 NIC Link is Down
bonding: bond0: Adding slave eth1.
bnx2x: eth1: using MSI-X  IRQs: sp 186  fp 194 - 218
bnx2x: eth0 NIC Link is Down
bnx2x: eth0 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond0: enslaving eth1 as an active interface with a down link.
bonding: bond0: link status definitely up for interface eth0.
bonding: bond0: making interface eth0 the new active one.
bnx2x: eth1 NIC Link is Down
bonding: bond0: first active interface up!
bonding: bond0: link status definitely up for interface eth1.
bonding: bond0: link status definitely down for interface eth1, disabling it
device eth0 entered promiscuous mode
type=1700 audit(1253260923.120:3): dev=eth0 prom=256 old_prom=0 auid=4294967295 ses=4294967295
bnx2x: eth1 NIC Link is Down
bnx2x: eth1 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond0: link status definitely up for interface eth1.
bonding: bond1 is being created...
bonding: bond1: Adding slave eth2.
bnx2x: eth2: using MSI-X  IRQs: sp 226  fp 234 - 67
bnx2x: eth2 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to be 100Mb/sec and Full.
bonding: bond1: enslaving eth2 as an active interface with an up link.
bonding: bond1: Adding slave eth3.
bnx2x: eth3: using MSI-X  IRQs: sp 75  fp 83 - 107
bnx2x: eth3 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond1: Warning: failed to get speed and duplex from eth3, assumed to be 100Mb/sec and Full.
bonding: bond1: enslaving eth3 as an active interface with an up link.
bonding: bond2 is being created...
bonding: bond2: Adding slave eth4.
bnx2x: eth4: using MSI-X  IRQs: sp 115  fp 123 - 147
bnx2x: eth4 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond2: Warning: failed to get speed and duplex from eth4, assumed to be 100Mb/sec and Full.
bonding: bond2: enslaving eth4 as an active interface with an up link.
bonding: bond2: Adding slave eth5.
bnx2x: eth5: using MSI-X  IRQs: sp 155  fp 163 - 187
bnx2x: eth5 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond2: Warning: failed to get speed and duplex from eth5, assumed to be 100Mb/sec and Full.
bonding: bond2: enslaving eth5 as an active interface with an up link.
device eth0 left promiscuous mode
type=1700 audit(1253260933.040:4): dev=eth0 prom=0 old_prom=256 auid=4294967295 ses=4294967295



# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 80
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:23:7d:f1:7c:c0

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:23:7d:f1:7c:c4

# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: None
MII Status: up
MII Polling Interval (ms): 80
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:23:7d:f1:7c:c1

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:23:7d:f1:7c:c5

# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: None
MII Status: up
MII Polling Interval (ms): 80
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:23:7d:f1:7c:c2

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:23:7d:f1:7c:c6




# ethtool -i eth0
driver: bnx2x
version: 1.48.105
firmware-version: BC:4.8.0 PHY:baa0:0105
bus-info: 0000:02:00.0

# ethtool -i eth1
driver: bnx2x
version: 1.48.105
firmware-version: BC:4.8.0 PHY:baa0:0105
bus-info: 0000:02:00.1

# ethtool -i eth2
driver: bnx2x
version: 1.48.105
firmware-version: BC:4.8.0
bus-info: 0000:02:00.2

# ethtool -i eth3
driver: bnx2x
version: 1.48.105
firmware-version: BC:4.8.0
bus-info: 0000:02:00.3

(same for eth4/eth5)




# ethtool eth0
Settings for eth0:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                2500baseX/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                2500baseX/Full
                                10000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x00000000 (0)
        Link detected: yes

# ethtool eth1
Settings for eth1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                2500baseX/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                2500baseX/Full
                                10000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x00000000 (0)
        Link detected: yes

# ethtool eth2
Settings for eth2:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                2500baseX/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                2500baseX/Full
                                10000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: Unknown! (4500)
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)
        Link detected: yes

# ethtool eth3
Settings for eth3:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                2500baseX/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                2500baseX/Full
                                10000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: Unknown! (4500)
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)
        Link detected: yes

(same for eth4/eth5)

# ifenslave -c bond1 eth2
Master 'bond1', Slave 'eth2': Error: Change active failed

# strace ...
ioctl(3, SIOCETHTOOL, 0x7fff5b3cd360) = 0
ioctl(3, SIOCGIFMTU, {ifr_name="bond1", ifr_mtu=1500}) = 0
ioctl(3, SIOCGIFFLAGS, {ifr_name="bond1", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MASTER|IFF_MULTICAST}) = 0
ioctl(3, SIOCGIFHWADDR, {ifr_name="bond1", ifr_hwaddr=00:23:7d:f1:7c:c1}) = 0
ioctl(3, SIOCGIFFLAGS, {ifr_name="eth2", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_SLAVE|IFF_MULTICAST}) = 0
ioctl(3, SIOCBONDCHANGEACTIVE, 0x7fff5b3cd330) = -1 EINVAL (Invalid argument)
ioctl(3, 0x89fd, 0x7fff5b3cd330) = -1 EINVAL (Invalid argument)
write(2, "Master 'bond1', Slave 'eth2': Er"..., 58Master 'bond1', Slave 'eth2': Error: Change active failed
) = 58
close(3) = 0

Comment 1 David111 2009-09-18 15:53:19 UTC
- mode 5 (and 6) don't work, mode 0 and 1 do.
- bond0 always works, whatever the mode.
- fully enabling ipv6 doesn't help for modes 5 and 6.

Comment 2 David111 2009-09-18 16:09:39 UTC
Looks like the same problem as http://lkml.org/lkml/2009/3/15/114

Comment 3 Andy Gospodarek 2009-09-25 21:20:31 UTC
Have you moved your bonding configuration out of /etc/modprobe.conf and into a section called BONDING_OPTS in /etc/sysconfig/network-scripts/ifcfg-bond1?  I suspect that is the problem.

The RHEL4 and FC<7 way of configuring bonding is no longer supported and we suggest using the bonding module options as:

BONDING_OPTS="mode=6 miimon=80" 

in ifcfg-bondX files.

I will leave this bug open as I'm not sure this is exactly the problem, but please test based on my suggestions and report whether this bug can be closed.  

Thanks!

Comment 4 David111 2009-09-28 09:57:31 UTC
Hi Andy, I've tested using BONDING_OPTS but it gives the same result.

Strange things is that bond0 works fine, thus implying a problem elsewhere (our network ?) for bond1 and bond2. But then again, all ifaces were working fine with kernel 2.6.18-7.1 using mode 5 or mode 6.

Comment 5 Andy Gospodarek 2009-09-30 02:46:28 UTC
This is an interesting one, David.  These messages seemed a bit odd to me:

bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to
be 100Mb/sec and Full.

until I looked at your ethtool output:

# ethtool eth2
Settings for eth2:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                2500baseX/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
                                2500baseX/Full
                                10000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: Unknown! (4500)
        Duplex: Full
        Port: FIBRE
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)
        Link detected: yes

The speed of 4500 is a bit odd, but I'm quite sure this is done when using a multi-function bnx2x-based device and ethtool will report max bandwidth available.  Are you using a Broadcom 57711?

The initialization of the bond1 and bond2 also looks different from bond0, and I suspect this comes down to a difference in the way link speed and duplex is detected on what I think are these multi-funtion 57711 devices.

Also, the patch in comment #2 does appear in the latest RHEL5 kernels, so I don't suspect needing that is a problem.

Is there any way you could take down bond1 switch the configuration for bond1 to active-backup, bring it back up and paste the logs here?

Comment 6 David111 2009-09-30 07:47:48 UTC
57711 indeed, it's a HP BL460c G6 blade in c7000 enclosure, with Flex-10 VirtualConnect modules: 1 Gb for bond0 network, 4.5 for bond1 and 4.5 for bond2.

lspci -v for eth2:

02:00.3 Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10Gigabit PCIe
        Subsystem: Hewlett-Packard Company NC532i Dual Port 10GbE Multifunction BL-C Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 138
        Memory at f8000000 (64-bit, non-prefetchable) [size=8M]
        Memory at f7800000 (64-bit, non-prefetchable) [size=8M]
        Capabilities: <access denied>


dmesg with bond1 in active-backup mode:

bonding: bond1 is being created...
bonding: bond1: setting mode to active-backup (1).
bonding: bond1: Setting MII monitoring interval to 80.
bonding: bond1: Adding slave eth2.
bnx2x: eth2: using MSI-X  IRQs: sp 69  fp 70 - 77
bnx2x: eth2 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to be 100Mb/sec and Full.
bonding: bond1: making interface eth2 the new active one.
bonding: bond1: first active interface up!
bonding: bond1: enslaving eth2 as an active interface with an up link.
bonding: bond1: Adding slave eth3.
bnx2x: eth3: using MSI-X  IRQs: sp 78  fp 83 - 86
bnx2x: eth3 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond1: Warning: failed to get speed and duplex from eth3, assumed to be 100Mb/sec and Full.
bonding: bond1: enslaving eth3 as a backup interface with an up link.

Comment 7 David111 2009-09-30 07:50:32 UTC
Capabilities for the incomplete lspci output above:
        Capabilities: [48] Power Management version 3
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
        Capabilities: [a0] MSI-X: Enable+ Mask- TabSize=17
        Capabilities: [ac] Express Endpoint IRQ 0
        Capabilities: [100] Device Serial Number c4-7c-f1-fe-ff-7d-23-00
        Capabilities: [110] Advanced Error Reporting
        Capabilities: [150] Power Budgeting
        Capabilities: [160] Virtual Channel

Comment 8 Andy Gospodarek 2009-09-30 15:16:30 UTC
David, I was correct when I mentioned that the patch in comment #2 was in the latest RHEL5 kernels, but 2.6.18-164 is not the latest RHEL5 development kernel.

That patch was added in 2.6.18-165.  Would you mind building -164 with the patch from comment #2 or testing with the latest development kernel from:

http://people.redhat.com/dzickus/el5/

Those are obviously untested and unsupported, but it would probably a quick way to see if that patch resolves the issue (which I suspect it will).

Comment 9 David111 2009-09-30 15:41:20 UTC
I somehow tried to rebuild the 164 with the patch a few weeks ago but was lost while trying to produce a proper patch for 164 with the myriad of patches already applied to vanilla :) (including full bonding driver updates)

Anyway, I just tested with http://people.redhat.com/dzickus/el5/165.el5/x86_64/kernel-2.6.18-165.el5.x86_64.rpm and it works fine ! 

Here's the dmesg for bond1 and bond2:
bonding: bond1 is being created...
bonding: bond1: setting mode to balance-tlb (5).
bonding: bond1: Setting MII monitoring interval to 80.
bonding: bond1: Adding slave eth2.
bnx2x: eth2: using MSI-X  IRQs: sp 226  fp 234 - 67
bnx2x: eth2 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond1: Warning: failed to get speed and duplex from eth2, assumed to be 100Mb/sec and Full.
bonding: bond1: making interface eth2 the new active one.
bonding: bond1: first active interface up!
bonding: bond1: enslaving eth2 as an active interface with an up link.
bonding: bond1: Adding slave eth3.
bnx2x: eth3: using MSI-X  IRQs: sp 75  fp 83 - 107
bnx2x: eth3 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond1: Warning: failed to get speed and duplex from eth3, assumed to be 100Mb/sec and Full.
bonding: bond1: enslaving eth3 as an active interface with an up link.
bonding: bond2 is being created...
bonding: bond2: setting mode to balance-alb (6).
bonding: bond2: Setting MII monitoring interval to 80.
bonding: bond2: Adding slave eth4.
bnx2x: eth4: using MSI-X  IRQs: sp 115  fp 123 - 147
bnx2x: eth4 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond2: Warning: failed to get speed and duplex from eth4, assumed to be 100Mb/sec and Full.
bonding: bond2: making interface eth4 the new active one.
bonding: bond2: first active interface up!
bonding: bond2: enslaving eth4 as an active interface with an up link.
bonding: bond2: Adding slave eth5.
bnx2x: eth5: using MSI-X  IRQs: sp 155  fp 163 - 187
bnx2x: eth5 NIC Link is Up, 10000 Mbps full duplex, receive & transmit flow control ON
bonding: bond2: Warning: failed to get speed and duplex from eth5, assumed to be 100Mb/sec and Full.
bonding: bond2: enslaving eth5 as an active interface with an up link.

Thanks for your help !

Comment 10 Andy Gospodarek 2009-09-30 17:05:28 UTC
Excellent!  Thanks for the quick testing and feedback.

I would like to close this, but there needs to be some re-work in the bonding code to deal with different speeds that are being provided by the virtual function drivers.  I would rather that bonding doesn't print that ugly message that it is presuming that the link is 100/FD.  That will be a problem for some modes that consider speed of the link for output port selection.

I'll have to look-up some more specs on that chassis to see if I can get a feel for what speeds are offered.  It might make a difference for the changes that I make for bonding.

Comment 11 Andy Gospodarek 2009-11-30 16:28:59 UTC
I'm going to go ahead and close this out since bonding is working after adding the patch added to resolve bug 499884.

*** This bug has been marked as a duplicate of bug 499884 ***


Note You need to log in before you can comment on or make changes to this bug.