Bug 442339 - Adding VLANs on a Bonding interface results in dropping a slave
Summary: Adding VLANs on a Bonding interface results in dropping a slave
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-14 14:13 UTC by Jasper Capel
Modified: 2008-05-21 11:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 11:47:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
interface config (156 bytes, application/octet-stream)
2008-05-15 08:57 UTC, Jasper Capel
no flags Details

Description Jasper Capel 2008-04-14 14:13:51 UTC
Description of problem:
First slave of a bond loses link when a VLAN is added using a bond interface.

How reproducible:
Add a tagged VLAN to a bond interface.

Steps to Reproduce:
1. Create a bond interface (bond0). This bug is present with both active-passive
and balance-alb. Add two slaves to this bond interfaces using the standard
method in network-scripts (eth0 and eth1).
2. Bring the bond interface up. 
3. Add a VLAN to the bond interface using the standard network-scripts method
(bond0.1000). Run "ifup bond0.1000".
  
Actual results:
The first slave device in the bond loses it's link, in a single link situation
this results in an unreachable bond interface.
Excerpt from /var/log/messages:
Apr 14 15:48:27 hostname kernel: bonding: bond0: link status definitely down for
interface eth0, disabling it

The VLAN is added, but the interface is down (ethtool reports link down on the
slave). This can be corrected by running ifup eth0.

Expected results:
Both slaves should remain up and running and the VLAN should just be added.

Additional info:
None, but if you need anything, don't hesitate to contact me.

Comment 1 Dennis Marinus 2008-05-08 09:05:09 UTC
I can confirm that this bug exists and is reproducable.

The bug shows itself on machines using different NIC drivers. Tested with both e1000 (Intel) and tg3 (Broadcom) drivers. The 
results are identical which suggests the problem is in a generic shared part of the networking code.

Comment 2 Andy Gospodarek 2008-05-12 21:08:49 UTC
Jasper and Dennis,

Can you please attach your configuration files that you use for ethernet
devices, bonding, and for the VLAN interfaces?  I'm not sure I need them, but I
would like to have yours to be sure I'm testing everything correctly.

I've also managed to recreate this on one of my systems and it appears that
after setting up a bond, if I type:

# vconfig add bond0 1000

it appears that a callback (most-likely hotplug) calls ifup bond0.1000 and that
may not be needed.

If I comment out the call to ifup-eth that is called from ifup, then it appears
we don't run into this issue.

This patch also seems to work-around the issue, but I'm not sure it's the
correct solution:

--- ifup-eth.orig       2008-05-12 16:45:28.000000000 -0400
+++ ifup-eth    2008-05-12 17:09:32.000000000 -0400
@@ -112,6 +112,7 @@
         /sbin/ethtool -s ${REALDEVICE} $ETHTOOL_OPTS
     fi

+    /sbin/ip link set dev ${DEVICE} up
     exit 0
 fi


Adding notting to the cc-list so we can weigh-in on on this, too.

Comment 3 Bill Nottingham 2008-05-12 21:28:19 UTC
Setting HOTPLUG=no in the appropriate config file should fix this, FWIW. But I'd
still like to see the config files in use.

Comment 4 Andy Gospodarek 2008-05-13 17:45:16 UTC
Interesting...I see now where that hook is used.


Comment 5 Jasper Capel 2008-05-15 08:57:25 UTC
Created attachment 305455 [details]
interface config

Comment 6 Jasper Capel 2008-05-15 08:59:51 UTC
Oh, I can't add multiple attachments to one message, sorry, didn't mean to spam. ;-)

Anyway, here are my configs. I haven't gotten around to testing with HOTPLUG=no yet.

# Intel Corporation 82571EB Gigabit Ethernet Controller
DEVICE=nslave0
HWADDR=00:15:17:63:48:E1
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
SLAVE=yes
MASTER=bond1

# Intel Corporation 82571EB Gigabit Ethernet Controller
DEVICE=nslave1
HWADDR=00:15:17:63:4A:81
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
SLAVE=yes
MASTER=bond1


DEVICE=bond1
BOOTPROTO=static
IPADDR=10.100.100.132
NETMASK=255.255.255.128
USERCTL=no
ONBOOT=yes
BONDING_OPTS="arp_ip_target=+10.100.100.129 arp_ip_target=+10.100.100.130
arp_interval=500"



Comment 7 Bill Nottingham 2008-05-15 17:16:07 UTC
Do you not have a config for the vlan device?

Comment 8 Jasper Capel 2008-05-16 07:37:55 UTC
I was testing without one for the moment, as the vcontrol command triggered the
error as well.

The vlan interface config file I used looked like this:

DEVICE=bond1.1000
BOOTPROTO=static
VLAN=yes
IPADDR=172.18.0.1
NETMASK=255.255.255.0
ONPARENT=yes


Comment 9 Bill Nottingham 2008-05-19 20:15:13 UTC
The simplest solution would be HOTPLUG=no in 'ifcfg-bondX'. Does that work for you?

Unfortunately, fixing this cleanly would require changing functionality that
other people may have come to rely on (hotplugging a network device
automatically finding the appropriate configuration and renaming the new device
to that configured device), which means it wouldn't really be appropriate for RHEL.

Comment 10 Jasper Capel 2008-05-20 08:11:56 UTC
Unfortunately not, my ifcfg-bond1 now looks like this:
DEVICE=bond1
BOOTPROTO=static
IPADDR=10.100.100.132
NETMASK=255.255.255.128
USERCTL=no
ONBOOT=yes
HOTPLUG=no
BONDING_OPTS="arp_ip_target=+10.100.100.129 arp_ip_target=+10.100.100.130
arp_interval=500"


Adding a VLAN either through vcontrol or using ifup bond1.1000 still results in
the following message (dmesg):
bonding: bond1: Interface nslave0 is already enslaved!
bonding: bond1: interface nslave0 is now down.
bonding: bond1: now running without any active interface !


Comment 11 Bill Nottingham 2008-05-20 16:09:02 UTC
Argh, mistyped.

HOTPLUG=no needs to go in the *slave* interface configuration (i.e., wherever
the HWADDR is).

Comment 12 Jasper Capel 2008-05-21 06:04:19 UTC
Yes, that works for me. Thanks! :)
I don't seem to be missing any required functionality, so this
solution/workaround is fine.

Comment 13 Andy Gospodarek 2008-05-21 11:47:27 UTC
Excellent!  I will close this one out.


Note You need to log in before you can comment on or make changes to this bug.