Bug 432622
Summary: | Bonding interface always starts with one slave down | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Juanjo Villaplana <villapla> |
Component: | initscripts | Assignee: | initscripts Maintenance Team <initscripts-maint-list> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Brock Organ <borgan> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 4.6 | CC: | fleitner, jn, kajtzu, michael, notting, tao |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-03-18 21:15:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Juanjo Villaplana
2008-02-13 13:27:41 UTC
Created attachment 294772 [details]
/etc/sysconfig/network
Created attachment 294773 [details]
/etc/sysconfig/network-scripts/ifcfg-eth0
Created attachment 294774 [details]
/etc/sysconfig/network-scripts/ifcfg-eth1
Created attachment 294775 [details]
/etc/sysconfig/network-scripts/ifcfg-bond0
Created attachment 294776 [details]
/etc/modprobe.conf
Created attachment 294777 [details]
'service network start' kernel messages
Created attachment 294783 [details]
/etc/rc.d/init.d/network patch
After some testing we have found that the problem is related to that (from
initscript's changelog):
* Sat Jun 23 2007 Bill Nottingham <notting> - 7.93.30.EL-1
- init.d/network, network-functions: don't fiddle with hotplug settings
(#185569, #209307)
The attached patch reverts hotplug code to /etc/rc.d/init.d/network and fixes
this problem.
I'm not authorized to access bugs #185569 and #209307, so I guess this patch
may break something else, but this hotplug code was already present on
initscripts-7.93.29.EL-1 and it worked fine for us.
What happens if you change the slaves to be 'ONBOOT=no'? Created attachment 294816 [details] 'service network start' kernel messages Setting ONBOOT=no doesn't help: # service network start Setting network parameters: [ OK ] Bringing up loopback interface: [ OK ] Setting 802.1Q VLAN parameters: [ OK ] Bringing up interface bond0: [ OK ] Bringing up interface vlan8: [ OK ] # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v2.6.3-rh (June 8, 2005) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth1 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: down Link Failure Count: 1 Permanent HW addr: 00:19:bb:c7:a8:72 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:19:bb:c7:a8:70 # ifconfig etho etho: error fetching interface information: Device not found [root@clu108 bz432622]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:19:BB:C7:A8:72 BROADCAST SLAVE MULTICAST MTU:1500 Metric:1 RX packets:94 errors:0 dropped:0 overruns:0 frame:0 TX packets:5 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9595 (9.3 KiB) TX bytes:412 (412.0 b) Interrupt:169 Memory:f8000000-f8012100 Note that this setup worked fine until the upgrade of initscripts. I'm assuming eth0 does actually have a valid link, of course. If you instrument /etc/hotplug/net.agent, is it actually being invoked? Created attachment 294817 [details] 'ifconfig eth0 up' kernel messages > I'm assuming eth0 does actually have a valid link, of course. Yes: # ifconfig eth0 up # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v2.6.3-rh (June 8, 2005) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth1 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:19:bb:c7:a8:72 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:19:bb:c7:a8:70 Created attachment 294818 [details] Instrumented net.agent output > If you instrument /etc/hotplug/net.agent, is it actually being invoked? I added this on the top of /etc/hotplug/net.agent: set -x exec > /tmp/net.agent.$$ 2>&1 and attached the output generated by "service network start". Is this what you needed? Can you attach your vlan config? Created attachment 294820 [details]
/etc/sysconfig/network-scripts/ifcfg-vlan8
OK, the simple reason this is happening is that the bringing up of vlan8 is causing a hotplug event which is not delivered until well after /etc/init.d/network finishes. That doesn't make much sense at first glance. I suspect adding a 'sleep 5' before the 'touch /var/lock/subsys/network' in /etc/init.d/network will 'fix' it, but that's obviously not the right answer. This hotplug event looks for the MAC address of vlan8 in your config to bring up the device; it matches eth0. So, 'ifup eth0' is run, which attempts to enslave the (already enslaved) device. The first step of enslavement is setting the link down; it then attempts to enslave it, which fails (due to already being enslaved.). But the link remains down. There are a couple of options to 'fix' this; one would be to detach the device before enslaving it. That could cause a lot of bouncing of link status, though. We could check whether or not it's enslaved, but that is not practical with the bonding support in RHEL 4. Root causing why the hotplug event is so late might help, but there may still be a race there. The simplest fix I can think of for your case is to add 'HOTPLUG=no' to ifcfg-eth0 and ifcfg-eth1; that should solve the problem. I reverted ONBOOT=yes and added HOTPLUG=no to ifcfg-eth[01] and this solved the problem. Does setting HOTPLUG=no have any side effect we should care about? Reverting the ONBOOT=yes shouldn't make a difference. HOTPLUG=no means that hotplug events (caused by adding/removing the device, or module) will be ignored. It would mean that you'd have to manually bring the interface up if you unloaded and reloaded the bnx2 module, for example. OK. I will leave untouched /etc/rc.d/init.d/network and add HOTPLUG=no to ifcfg-eth[01] in order to fix this issue. Your (extremely fast) help is very appreciated. Regards, Juanjo. This issue persists on RHEL 4.7 (initscripts-7.93.33-1.el4). *** Bug 159500 has been marked as a duplicate of this bug. *** Given the existing workaround (HOTPLUG=no in configuration), and the current update status of RHEL 4, I'm closing this. It should work without configuration changes in RHEL 5. With the goal of minimizing risk of change for deployed systems, and in response to customer and partner requirements, Red Hat takes a conservative approach when evaluating changes for inclusion in maintenance updates for currently deployed products. The primary objectives of update releases are to enable new hardware platform support and to resolve critical defects. *** Bug 498480 has been marked as a duplicate of this bug. *** |