Bug 839714 - bonding fails to start correctly
bonding fails to start correctly
Status: CLOSED DUPLICATE of bug 834764
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Veaceslav Falico
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-12 11:56 EDT by Kapetanakis Giannis
Modified: 2014-09-30 19:44 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-17 08:04:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kapetanakis Giannis 2012-07-12 11:56:23 EDT
Hi,

After updating to 6.3 and kernel 2.6.32-279.1.1.el6.x86_64
bonding does not work.

I have the following setup:

eth2-|       |-bond0.1----vlan1br
     |-bond0-|-bond0.2----vlan2br
eth3-|       |-bond0.100--vlan100br

All interfaces (eth2, eth3, bond0, bond0.X, vlanXbr) come up but I don't have network on the Vlan bridges. 

Reverting backup to 2.6.32-220.23.1 fixes the problem.

Also there is a workaround:
# ifdown eth2; ifup eth2
# ifdown eth3; ifup eth3

This is my configuration:

# cat /etc/modprobe.d/bonding.conf 
#alias bond0 bonding
alias netdev-bond0 bonding
#options bond0 -o bond0 mode=balance-rr miimon=100

# cat ifcfg-eth2 
DEVICE="eth2"
ONBOOT="yes"
MASTER=bond0
SLAVE=yes

# cat ifcfg-eth3
DEVICE="eth3"
ONBOOT=yes
MASTER=bond0
SLAVE=yes

# cat ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
IPV6INIT=yes
IPV6_AUTOCONF=no
DHCPV6=no
BONDING_OPTS="mode=1 miimon=100 downdelay=500 updelay=30000"

# cat ifcfg-bond0.100
DEVICE=bond0.100
VLAN=yes
ONBOOT=yes
IPV6INIT=yes
IPV6_AUTOCONF=no
DHCPV6=no
BRIDGE=vlan100br

# ethtool eth2
Settings for eth2:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

# ethtool eth3
Settings for eth3:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

# egrep -i 'eth2|eth3|bond' /var/log/messages
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.0: eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem da000000, IRQ 32, node addr bc:30:5b:dc:b0:c2
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.1: eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 42, node addr bc:30:5b:dc:b0:c4
Jul 12 18:32:08 server kernel: Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Jul 12 18:32:08 server kernel: bonding: bond0: setting mode to active-backup (1).
Jul 12 18:32:08 server kernel: bonding: bond0: Setting MII monitoring interval to 100.
Jul 12 18:32:08 server kernel: bonding: bond0: Setting down delay to 500.
Jul 12 18:32:08 server kernel: bonding: bond0: Setting up delay to 30000.
Jul 12 18:32:08 server kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready
Jul 12 18:32:08 server kernel: 8021q: adding VLAN 0 to HW filter on device bond0
Jul 12 18:32:08 server kernel: bonding: bond0: Adding slave eth2.
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.0: eth2: using MSIX
Jul 12 18:32:08 server kernel: bonding: bond0: enslaving eth2 as a backup interface with a down link.
Jul 12 18:32:08 server kernel: bonding: bond0: Adding slave eth3.
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.1: eth3: using MSIX
Jul 12 18:32:08 server kernel: bonding: bond0: enslaving eth3 as a backup interface with a down link.
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.0: eth2: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:32:08 server kernel: bonding: bond0: link status up for interface eth2, enabling it in 0 ms.
Jul 12 18:32:08 server kernel: bond0: link status definitely up for interface eth2, 1000 Mbps full duplex.
Jul 12 18:32:08 server kernel: bonding: bond0: making interface eth2 the new active one.
Jul 12 18:32:08 server kernel: bonding: bond0: first active interface up!
Jul 12 18:32:08 server kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
Jul 12 18:32:08 server kernel: bnx2 0000:02:00.1: eth3: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:32:08 server kernel: bonding: bond0: link status up for interface eth3, enabling it in 30000 ms.
Jul 12 18:32:08 server kernel: device bond0.100 entered promiscuous mode
Jul 12 18:32:08 server kernel: device bond0 entered promiscuous mode
Jul 12 18:32:08 server kernel: device eth2 entered promiscuous mode
Jul 12 18:32:08 server kernel: vlan2br: port 1(bond0.100) entering forwarding state
Jul 12 18:32:32 server kernel: bond0: link status definitely up for interface eth3, 1000 Mbps full duplex.


# ifdown eth2; ifup eth2
# ifdown eth3; ifup eth2

# egrep -i 'eth2|eth3|bond' /var/log/messages

Jul 12 18:37:16 server kernel: bonding: bond0: Removing slave eth2
Jul 12 18:37:16 server kernel: bonding: bond0: Warning: the permanent HWaddr of eth2 - bc:30:5b:dc:b0:c2 - is still in use by bond0. Set the HWaddr of eth2 to a different address to avoid conflicts.
Jul 12 18:37:16 server kernel: bonding: bond0: releasing active interface eth2
Jul 12 18:37:16 server kernel: device eth2 left promiscuous mode
Jul 12 18:37:16 server kernel: bonding: bond0: making interface eth3 the new active one.
Jul 12 18:37:16 server kernel: device eth3 entered promiscuous mode
Jul 12 18:37:21 server kernel: bonding: bond0: Adding slave eth2.
Jul 12 18:37:21 server kernel: bnx2 0000:02:00.0: eth2: using MSIX
Jul 12 18:37:21 server kernel: bonding: bond0: enslaving eth2 as a backup interface with a down link.
Jul 12 18:37:23 server kernel: bnx2 0000:02:00.0: eth2: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:37:23 server kernel: bonding: bond0: link status up for interface eth2, enabling it in 30000 ms.
Jul 12 18:37:31 server kernel: bonding: bond0: Removing slave eth3
Jul 12 18:37:31 server kernel: bonding: bond0: releasing active interface eth3
Jul 12 18:37:31 server kernel: device eth3 left promiscuous mode
Jul 12 18:37:31 server kernel: bonding: bond0: making interface eth2 the new active one 7600 ms earlier.
Jul 12 18:37:31 server kernel: device eth2 entered promiscuous mode
Jul 12 18:37:35 server kernel: bonding: bond0: Adding slave eth3.
Jul 12 18:37:35 server kernel: bnx2 0000:02:00.1: eth3: using MSIX
Jul 12 18:37:35 server kernel: bonding: bond0: enslaving eth3 as a backup interface with a down link.
Jul 12 18:37:37 server kernel: bnx2 0000:02:00.1: eth3: NIC Copper Link is Up, 1000 Mbps full duplex
Jul 12 18:37:37 server kernel: bonding: bond0: link status up for interface eth3, enabling it in 30000 ms.

Removing downdelay=500 updelay=30000 from bonding options does not make any difference.

Doing
# ifdown bond0; ifup bond0
also fixes the problem.

regards,

Giannis
Comment 2 Veaceslav Falico 2012-07-13 09:16:23 EDT
Hi,

What do you mean that you don't have network? There are no packets getting to the vlaned interfaces? Can you take a look if there are packets on ethX interfaces (i.e. the bonding is not forwarding them correctly to vlans)?

Also, you've mentioned the vlan bridges - do you actually have a bridge after them (br0 or whatever) or not? If yes - can you try to rule it out?

Thank you!
Comment 3 Veaceslav Falico 2012-07-13 09:28:22 EDT
Also, it seems like it's the same issue https://bugzilla.redhat.com/show_bug.cgi?id=834764 .
Comment 4 Kapetanakis Giannis 2012-07-13 11:31:08 EDT
The VMs which are attaching to the bridges don't have network.

Furthermore I add IP address on the host on bond0.100
and still no network.

I didn't watch counters on eth2/eth3/bond0/bond0.100
nor did any tcpdump to see what's going on.

I will try that later at night.

If it's the same bug as in https://bugzilla.redhat.com/show_bug.cgi?id=834764 will it be available only in 6.4 ?

thanks
Comment 5 Kapetanakis Giannis 2012-07-15 11:20:54 EDT
I did some more tests today:

# ifconfig bond0.100 192.168.1.200 netmask 255.255.255.0
# ping 192.168.1.1 (gw)

tcpdump on eth2 (active interface on bonding):

18:13:58.961746 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:14:00.961742 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:14:02.986087 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:14:23.123827 ARP, Request who-has 192.168.1.1 tell 192.168.1.200, length 28
18:14:24.123827 ARP, Request who-has 192.168.1.1 tell 192.168.1.200, length 28

On bond 0.100 and vlan100br I see only the ARP request
I have counters an all interfaces.

After the ifdown/ifup

eth2:

18:17:06.034323 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]
18:17:06.625641 IP 192.168.1.200 > 192.168.1.1: ICMP echo request, id 61986, seq 121, length 64
18:17:07.625650 IP 192.168.1.200 > 192.168.1.1: ICMP echo request, id 61986, seq 122, length 64

bond0.100:

18:17:43.719710 STP 802.1d, Config, Flags [none], bridge-id 8019.00:0c:ce:a8:df:80.8032, length 42
18:17:43.719721 STP 802.1d, Config, Flags [none], bridge-id 8019.00:0c:ce:a8:df:80.8032, length 42
18:17:43.922517 IP 192.168.1.200 > 192.168.1.1: ICMP echo request, id 37157, seq 1, length 64
18:17:43.923281 IP 192.168.1.1 > 192.168.1.200: ICMP echo reply, id 37157, seq 1, length 64

same on vlan100br
Comment 6 Kapetanakis Giannis 2012-07-15 13:12:58 EDT
One correction:

I was adding IP on vlan100br and not on bond0.100 for the tests.

Anyway, patch https://bugzilla.redhat.com/attachment.cgi?id=594187
worked for me too.

So you can close this one and mark it as duplicate of #834764
Comment 7 Veaceslav Falico 2012-07-17 08:04:20 EDT
Thanks for a quick response, closing.

*** This bug has been marked as a duplicate of bug 834764 ***

Note You need to log in before you can comment on or make changes to this bug.