Bug 839714

Summary: bonding fails to start correctly
Product: Red Hat Enterprise Linux 6 Reporter: Kapetanakis Giannis <bilias>
Component: kernelAssignee: Veaceslav Falico <vfalico>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.3CC: nhorman, peterm
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-17 12:04:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kapetanakis Giannis 2012-07-12 15:56:23 UTC
Hi,

After updating to 6.3 and kernel 2.6.32-279.1.1.el6.x86_64
bonding does not work.

I have the following setup:

eth2-|       |-bond0.1----vlan1br
     |-bond0-|-bond0.2----vlan2br
eth3-|       |-bond0.100--vlan100br

All interfaces (eth2, eth3, bond0, bond0.X, vlanXbr) come up but I don't have network on the Vlan bridges. 

Reverting backup to 2.6.32-220.23.1 fixes the problem.

Also there is a workaround:
# ifdown eth2; ifup eth2
# ifdown eth3; ifup eth3

This is my configuration:

# cat /etc/modprobe.d/bonding.conf 
#alias bond0 bonding
alias netdev-bond0 bonding
#options bond0 -o bond0 mode=balance-rr miimon=100

# cat ifcfg-eth2 
DEVICE="eth2"
ONBOOT="yes"
MASTER=bond0
SLAVE=yes

# cat ifcfg-eth3
DEVICE="eth3"
ONBOOT=yes
MASTER=bond0
SLAVE=yes

# cat ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
IPV6INIT=yes
IPV6_AUTOCONF=no
DHCPV6=no
BONDING_OPTS="mode=1 miimon=100 downdelay=500 updelay=30000"

# cat ifcfg-bond0.100
DEVICE=bond0.100
VLAN=yes
ONBOOT=yes
IPV6INIT=yes
IPV6_AUTOCONF=no
DHCPV6=no
BRIDGE=vlan100br

# ethtool eth2
Settings for eth2:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

# ethtool eth3
Settings for eth3:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

# egrep -i 'eth2|eth3|bond' /var/log/messages
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.0: eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem da000000, IRQ 32, node addr bc:30:5b:dc:b0:c2
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.1: eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 42, node addr bc:30:5b:dc:b0:c4
Jul 12 18:32:08 server kernel: Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Jul 12 18:32:08 server kernel: bonding: bond0: setting mode to active-backup (1).
Jul 12 18:32:08 server kernel: bonding: bond0: Setting MII monitoring interval to 100.
Jul 12 18:32:08 server kernel: bonding: bond0: Setting down delay to 500.
Jul 12 18:32:08 server kernel: bonding: bond0: Setting up delay to 30000.
Jul 12 18:32:08 server kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready
Jul 12 18:32:08 server kernel: 8021q: adding VLAN 0 to HW filter on device bond0
Jul 12 18:32:08 server kernel: bonding: bond0: Adding slave eth2.
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.0: eth2: using MSIX
Jul 12 18:32:08 server kernel: bonding: bond0: enslaving eth2 as a backup interface with a down link.
Jul 12 18:32:08 server kernel: bonding: bond0: Adding slave eth3.
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.1: eth3: using MSIX
Jul 12 18:32:08 server kernel: bonding: bond0: enslaving eth3 as a backup interface with a down link.
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.0: eth2: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:32:08 server kernel: bonding: bond0: link status up for interface eth2, enabling it in 0 ms.
Jul 12 18:32:08 server kernel: bond0: link status definitely up for interface eth2, 1000 Mbps full duplex.
Jul 12 18:32:08 server kernel: bonding: bond0: making interface eth2 the new active one.
Jul 12 18:32:08 server kernel: bonding: bond0: first active interface up!
Jul 12 18:32:08 server kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.1: eth3: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:32:08 server kernel: bonding: bond0: link status up for interface eth3, enabling it in 30000 ms.
Jul 12 18:32:08 server kernel: device bond0.100 entered promiscuous mode
Jul 12 18:32:08 server kernel: device bond0 entered promiscuous mode
Jul 12 18:32:08 server kernel: device eth2 entered promiscuous mode
Jul 12 18:32:08 server kernel: vlan2br: port 1(bond0.100) entering forwarding state
Jul 12 18:32:32 server kernel: bond0: link status definitely up for interface eth3, 1000 Mbps full duplex.


# ifdown eth2; ifup eth2
# ifdown eth3; ifup eth2

# egrep -i 'eth2|eth3|bond' /var/log/messages

Jul 12 18:37:16 server kernel: bonding: bond0: Removing slave eth2
Jul 12 18:37:16 server kernel: bonding: bond0: Warning: the permanent HWaddr of eth2 - bc:30:5b:dc:b0:c2 - is still in use by bond0. Set the HWaddr of eth2 to a different address to avoid conflicts.
Jul 12 18:37:16 server kernel: bonding: bond0: releasing active interface eth2
Jul 12 18:37:16 server kernel: device eth2 left promiscuous mode
Jul 12 18:37:16 server kernel: bonding: bond0: making interface eth3 the new active one.
Jul 12 18:37:16 server kernel: device eth3 entered promiscuous mode
Jul 12 18:37:21 server kernel: bonding: bond0: Adding slave eth2.
Jul 12 18:37:21 server kernel: bnx2 0000:02:00.0: eth2: using MSIX
Jul 12 18:37:21 server kernel: bonding: bond0: enslaving eth2 as a backup interface with a down link.
Jul 12 18:37:23 server kernel: bnx2 0000:02:00.0: eth2: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:37:23 server kernel: bonding: bond0: link status up for interface eth2, enabling it in 30000 ms.
Jul 12 18:37:31 server kernel: bonding: bond0: Removing slave eth3
Jul 12 18:37:31 server kernel: bonding: bond0: releasing active interface eth3
Jul 12 18:37:31 server kernel: device eth3 left promiscuous mode
Jul 12 18:37:31 server kernel: bonding: bond0: making interface eth2 the new active one 7600 ms earlier.
Jul 12 18:37:31 server kernel: device eth2 entered promiscuous mode
Jul 12 18:37:35 server kernel: bonding: bond0: Adding slave eth3.
Jul 12 18:37:35 server kernel: bnx2 0000:02:00.1: eth3: using MSIX
Jul 12 18:37:35 server kernel: bonding: bond0: enslaving eth3 as a backup interface with a down link.
Jul 12 18:37:37 server kernel: bnx2 0000:02:00.1: eth3: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:37:37 server kernel: bonding: bond0: link status up for interface eth3, enabling it in 30000 ms.

Removing downdelay=500 updelay=30000 from bonding options does not make any difference.

Doing
# ifdown bond0; ifup bond0
also fixes the problem.

regards,

Giannis

Comment 2 Veaceslav Falico 2012-07-13 13:16:23 UTC
Hi,

What do you mean that you don't have network? There are no packets getting to the vlaned interfaces? Can you take a look if there are packets on ethX interfaces (i.e. the bonding is not forwarding them correctly to vlans)?

Also, you've mentioned the vlan bridges - do you actually have a bridge after them (br0 or whatever) or not? If yes - can you try to rule it out?

Thank you!

Comment 3 Veaceslav Falico 2012-07-13 13:28:22 UTC
Also, it seems like it's the same issue https://bugzilla.redhat.com/show_bug.cgi?id=834764 .

Comment 4 Kapetanakis Giannis 2012-07-13 15:31:08 UTC
The VMs which are attaching to the bridges don't have network.

Furthermore I add IP address on the host on bond0.100
and still no network.

I didn't watch counters on eth2/eth3/bond0/bond0.100
nor did any tcpdump to see what's going on.

I will try that later at night.

If it's the same bug as in https://bugzilla.redhat.com/show_bug.cgi?id=834764 will it be available only in 6.4 ?

thanks

Comment 5 Kapetanakis Giannis 2012-07-15 15:20:54 UTC
I did some more tests today:

# ifconfig bond0.100 192.168.1.200 netmask 255.255.255.0
# ping 192.168.1.1 (gw)

tcpdump on eth2 (active interface on bonding):

18:13:58.961746 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:14:00.961742 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:14:02.986087 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:14:23.123827 ARP, Request who-has 192.168.1.1 tell 192.168.1.200, length 28
18:14:24.123827 ARP, Request who-has 192.168.1.1 tell 192.168.1.200, length 28

On bond 0.100 and vlan100br I see only the ARP request
I have counters an all interfaces.

After the ifdown/ifup

eth2:

18:17:06.034323 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:17:06.625641 IP 192.168.1.200 > 192.168.1.1: ICMP echo request, id 61986, seq 121, length 64
18:17:07.625650 IP 192.168.1.200 > 192.168.1.1: ICMP echo request, id 61986, seq 122, length 64

bond0.100:

18:17:43.719710 STP 802.1d, Config, Flags [none], bridge-id 8019.00:0c:ce:a8:df:80.8032, length 42
18:17:43.719721 STP 802.1d, Config, Flags [none], bridge-id 8019.00:0c:ce:a8:df:80.8032, length 42
18:17:43.922517 IP 192.168.1.200 > 192.168.1.1: ICMP echo request, id 37157, seq 1, length 64
18:17:43.923281 IP 192.168.1.1 > 192.168.1.200: ICMP echo reply, id 37157, seq 1, length 64

same on vlan100br

Comment 6 Kapetanakis Giannis 2012-07-15 17:12:58 UTC
One correction:

I was adding IP on vlan100br and not on bond0.100 for the tests.

Anyway, patch https://bugzilla.redhat.com/attachment.cgi?id=594187
worked for me too.

So you can close this one and mark it as duplicate of #834764

Comment 7 Veaceslav Falico 2012-07-17 12:04:20 UTC
Thanks for a quick response, closing.

*** This bug has been marked as a duplicate of bug 834764 ***