Description of problem: First slave of a bond loses link when a VLAN is added using a bond interface. How reproducible: Add a tagged VLAN to a bond interface. Steps to Reproduce: 1. Create a bond interface (bond0). This bug is present with both active-passive and balance-alb. Add two slaves to this bond interfaces using the standard method in network-scripts (eth0 and eth1). 2. Bring the bond interface up. 3. Add a VLAN to the bond interface using the standard network-scripts method (bond0.1000). Run "ifup bond0.1000". Actual results: The first slave device in the bond loses it's link, in a single link situation this results in an unreachable bond interface. Excerpt from /var/log/messages: Apr 14 15:48:27 hostname kernel: bonding: bond0: link status definitely down for interface eth0, disabling it The VLAN is added, but the interface is down (ethtool reports link down on the slave). This can be corrected by running ifup eth0. Expected results: Both slaves should remain up and running and the VLAN should just be added. Additional info: None, but if you need anything, don't hesitate to contact me.
I can confirm that this bug exists and is reproducable. The bug shows itself on machines using different NIC drivers. Tested with both e1000 (Intel) and tg3 (Broadcom) drivers. The results are identical which suggests the problem is in a generic shared part of the networking code.
Jasper and Dennis, Can you please attach your configuration files that you use for ethernet devices, bonding, and for the VLAN interfaces? I'm not sure I need them, but I would like to have yours to be sure I'm testing everything correctly. I've also managed to recreate this on one of my systems and it appears that after setting up a bond, if I type: # vconfig add bond0 1000 it appears that a callback (most-likely hotplug) calls ifup bond0.1000 and that may not be needed. If I comment out the call to ifup-eth that is called from ifup, then it appears we don't run into this issue. This patch also seems to work-around the issue, but I'm not sure it's the correct solution: --- ifup-eth.orig 2008-05-12 16:45:28.000000000 -0400 +++ ifup-eth 2008-05-12 17:09:32.000000000 -0400 @@ -112,6 +112,7 @@ /sbin/ethtool -s ${REALDEVICE} $ETHTOOL_OPTS fi + /sbin/ip link set dev ${DEVICE} up exit 0 fi Adding notting to the cc-list so we can weigh-in on on this, too.
Setting HOTPLUG=no in the appropriate config file should fix this, FWIW. But I'd still like to see the config files in use.
Interesting...I see now where that hook is used.
Created attachment 305455 [details] interface config
Oh, I can't add multiple attachments to one message, sorry, didn't mean to spam. ;-) Anyway, here are my configs. I haven't gotten around to testing with HOTPLUG=no yet. # Intel Corporation 82571EB Gigabit Ethernet Controller DEVICE=nslave0 HWADDR=00:15:17:63:48:E1 BOOTPROTO=none ONBOOT=yes USERCTL=no SLAVE=yes MASTER=bond1 # Intel Corporation 82571EB Gigabit Ethernet Controller DEVICE=nslave1 HWADDR=00:15:17:63:4A:81 BOOTPROTO=none ONBOOT=yes USERCTL=no SLAVE=yes MASTER=bond1 DEVICE=bond1 BOOTPROTO=static IPADDR=10.100.100.132 NETMASK=255.255.255.128 USERCTL=no ONBOOT=yes BONDING_OPTS="arp_ip_target=+10.100.100.129 arp_ip_target=+10.100.100.130 arp_interval=500"
Do you not have a config for the vlan device?
I was testing without one for the moment, as the vcontrol command triggered the error as well. The vlan interface config file I used looked like this: DEVICE=bond1.1000 BOOTPROTO=static VLAN=yes IPADDR=172.18.0.1 NETMASK=255.255.255.0 ONPARENT=yes
The simplest solution would be HOTPLUG=no in 'ifcfg-bondX'. Does that work for you? Unfortunately, fixing this cleanly would require changing functionality that other people may have come to rely on (hotplugging a network device automatically finding the appropriate configuration and renaming the new device to that configured device), which means it wouldn't really be appropriate for RHEL.
Unfortunately not, my ifcfg-bond1 now looks like this: DEVICE=bond1 BOOTPROTO=static IPADDR=10.100.100.132 NETMASK=255.255.255.128 USERCTL=no ONBOOT=yes HOTPLUG=no BONDING_OPTS="arp_ip_target=+10.100.100.129 arp_ip_target=+10.100.100.130 arp_interval=500" Adding a VLAN either through vcontrol or using ifup bond1.1000 still results in the following message (dmesg): bonding: bond1: Interface nslave0 is already enslaved! bonding: bond1: interface nslave0 is now down. bonding: bond1: now running without any active interface !
Argh, mistyped. HOTPLUG=no needs to go in the *slave* interface configuration (i.e., wherever the HWADDR is).
Yes, that works for me. Thanks! :) I don't seem to be missing any required functionality, so this solution/workaround is fine.
Excellent! I will close this one out.