Bug 442339 - Adding VLANs on a Bonding interface results in dropping a slave
Adding VLANs on a Bonding interface results in dropping a slave
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
low Severity high
: rc
: ---
Assigned To: Red Hat Kernel Manager
Martin Jenner
Depends On:
  Show dependency treegraph
Reported: 2008-04-14 10:13 EDT by Jasper Capel
Modified: 2008-05-21 07:47 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-05-21 07:47:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
interface config (156 bytes, application/octet-stream)
2008-05-15 04:57 EDT, Jasper Capel
no flags Details

  None (edit)
Description Jasper Capel 2008-04-14 10:13:51 EDT
Description of problem:
First slave of a bond loses link when a VLAN is added using a bond interface.

How reproducible:
Add a tagged VLAN to a bond interface.

Steps to Reproduce:
1. Create a bond interface (bond0). This bug is present with both active-passive
and balance-alb. Add two slaves to this bond interfaces using the standard
method in network-scripts (eth0 and eth1).
2. Bring the bond interface up. 
3. Add a VLAN to the bond interface using the standard network-scripts method
(bond0.1000). Run "ifup bond0.1000".
Actual results:
The first slave device in the bond loses it's link, in a single link situation
this results in an unreachable bond interface.
Excerpt from /var/log/messages:
Apr 14 15:48:27 hostname kernel: bonding: bond0: link status definitely down for
interface eth0, disabling it

The VLAN is added, but the interface is down (ethtool reports link down on the
slave). This can be corrected by running ifup eth0.

Expected results:
Both slaves should remain up and running and the VLAN should just be added.

Additional info:
None, but if you need anything, don't hesitate to contact me.
Comment 1 Dennis Marinus 2008-05-08 05:05:09 EDT
I can confirm that this bug exists and is reproducable.

The bug shows itself on machines using different NIC drivers. Tested with both e1000 (Intel) and tg3 (Broadcom) drivers. The 
results are identical which suggests the problem is in a generic shared part of the networking code.
Comment 2 Andy Gospodarek 2008-05-12 17:08:49 EDT
Jasper and Dennis,

Can you please attach your configuration files that you use for ethernet
devices, bonding, and for the VLAN interfaces?  I'm not sure I need them, but I
would like to have yours to be sure I'm testing everything correctly.

I've also managed to recreate this on one of my systems and it appears that
after setting up a bond, if I type:

# vconfig add bond0 1000

it appears that a callback (most-likely hotplug) calls ifup bond0.1000 and that
may not be needed.

If I comment out the call to ifup-eth that is called from ifup, then it appears
we don't run into this issue.

This patch also seems to work-around the issue, but I'm not sure it's the
correct solution:

--- ifup-eth.orig       2008-05-12 16:45:28.000000000 -0400
+++ ifup-eth    2008-05-12 17:09:32.000000000 -0400
@@ -112,6 +112,7 @@
         /sbin/ethtool -s ${REALDEVICE} $ETHTOOL_OPTS

+    /sbin/ip link set dev ${DEVICE} up
     exit 0

Adding notting to the cc-list so we can weigh-in on on this, too.
Comment 3 Bill Nottingham 2008-05-12 17:28:19 EDT
Setting HOTPLUG=no in the appropriate config file should fix this, FWIW. But I'd
still like to see the config files in use.
Comment 4 Andy Gospodarek 2008-05-13 13:45:16 EDT
Interesting...I see now where that hook is used.
Comment 5 Jasper Capel 2008-05-15 04:57:25 EDT
Created attachment 305455 [details]
interface config
Comment 6 Jasper Capel 2008-05-15 04:59:51 EDT
Oh, I can't add multiple attachments to one message, sorry, didn't mean to spam. ;-)

Anyway, here are my configs. I haven't gotten around to testing with HOTPLUG=no yet.

# Intel Corporation 82571EB Gigabit Ethernet Controller

# Intel Corporation 82571EB Gigabit Ethernet Controller

BONDING_OPTS="arp_ip_target=+ arp_ip_target=+

Comment 7 Bill Nottingham 2008-05-15 13:16:07 EDT
Do you not have a config for the vlan device?
Comment 8 Jasper Capel 2008-05-16 03:37:55 EDT
I was testing without one for the moment, as the vcontrol command triggered the
error as well.

The vlan interface config file I used looked like this:

Comment 9 Bill Nottingham 2008-05-19 16:15:13 EDT
The simplest solution would be HOTPLUG=no in 'ifcfg-bondX'. Does that work for you?

Unfortunately, fixing this cleanly would require changing functionality that
other people may have come to rely on (hotplugging a network device
automatically finding the appropriate configuration and renaming the new device
to that configured device), which means it wouldn't really be appropriate for RHEL.
Comment 10 Jasper Capel 2008-05-20 04:11:56 EDT
Unfortunately not, my ifcfg-bond1 now looks like this:
BONDING_OPTS="arp_ip_target=+ arp_ip_target=+

Adding a VLAN either through vcontrol or using ifup bond1.1000 still results in
the following message (dmesg):
bonding: bond1: Interface nslave0 is already enslaved!
bonding: bond1: interface nslave0 is now down.
bonding: bond1: now running without any active interface !
Comment 11 Bill Nottingham 2008-05-20 12:09:02 EDT
Argh, mistyped.

HOTPLUG=no needs to go in the *slave* interface configuration (i.e., wherever
the HWADDR is).
Comment 12 Jasper Capel 2008-05-21 02:04:19 EDT
Yes, that works for me. Thanks! :)
I don't seem to be missing any required functionality, so this
solution/workaround is fine.
Comment 13 Andy Gospodarek 2008-05-21 07:47:27 EDT
Excellent!  I will close this one out.

Note You need to log in before you can comment on or make changes to this bug.