From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.2-1.3.1 StumbleUpon/1.9993 Firefox/1.0.4 Description of problem: If you use the 'HWADDR' option in ifcfg-eth[0-9]+ (or the underlying /sbin/nameif that it calls) and then create a VLAN'd interface on top of the base eth[0-9] device, vconfig rem won't let you remove it, it hangs forever with: unregister_netdevice: waiting for eth0.2 to become free. Usage count = 7 This is a critical problem because the initscripts call vconfig rem when you ifdown a vlan interface or when you shutdown the system and all interfaces hang. Version-Release number of selected component (if applicable): vconfig-1.8-4 How reproducible: Always Steps to Reproduce: 1. Create a base Ethernet device configuration in (for example) ifcfg-eth0 2. Make sure to use HWADDR in ifcfg-eth0 and specify a proper MAC address 3. Create a VLAN interface such as ifcfg-eth0.2 4. Do '/sbin/service network start' 5. Do '/sbin/service network stop' Actual Results: You will get something like: unregister_netdevice: waiting for eth0.2 to become free. Usage count = 7 repeated forever. All networking related tools on the system seem to subsequently hang. A 'ps' reveals that it is 'vconfig rem eth0.2' which is causing the problem. Expected Results: It should have removed the vlan interface from the system without problems. Additional info: If I remove the HWADDR specifications from my ifcfg-eth[0-9] files then it works just fine. As soon as I add them back the problem returns.
Hmm... It seems that I get the 'unregister_netdevice' messages even without the ifcfg HWADDR option being set. I had them unset and performed a reboot and it hung on shutdown again. There is still something not right with the system, but it is less clear to me what the triggering factor is.
I have two machines one with the kernel 2.6.9-11 with the same problem and it seems that the other machine witch the kernel 2.6.9-5.0.5 doesn't have this problem
I strongly suspect by the descriptions of the problems that this is more likely to be a kernel related problem than then a userland vconfig problem. Reassigning bug to kernel component. Read ya, Phil
Hi, Do you think following information helpful? http://lists.osdl.org/pipermail/bugme-new/2005-April/004073.html http://bugs.gentoo.org/show_bug.cgi?id=87495 http://www.ussg.iu.edu/hypermail/linux/kernel/0501.3/2226.html
we have the same problem with a machine that's got VLANs configured on top of a bonded device. the network rc script is able to shut down a few of them, but then when it tries to shut down the interface that's configured with the machine's default route (although i'm not saying that distinguisher is in any way germane to the problem, although it has always been that particular interface that brings the process to a halt, but i'm not sure there aren't any interfacen following there- upon that wouldn't do the same, i'm saying), it hangs with a usage count of 1 this RH4 box replaced a Red Hat 3 machine that had vconfig-ed inter- faces directly ``atop'' the physical device, and that never exhibited such problems, FWIW. i thought it was maybe an interaction between bond-ing and vlan-ning, but it sounds like, from the other reports here maybe not
(In reply to comment #6) should have mentioned the machine is 32-bit ix86, not x86_64, like the original submitter's
I've seen this problem on two different RHEL4 systems with VLAN interfaces. Only one of the two was using HWADDR on the underlying interface and both of them are running the 32-bit kernel. The latest incidence was this morning running this kernel: 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:28:02 EDT 2006 i686 athlon i386 GNU/Linux The machine hung on shutdown with: unregister_netdevice: waiting for eth0.15 to become free. Usage count = 1
I just saw this problem again this morning on the same kernel. 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:28:02 EDT 2006 i686 athlon i386 GNU/Linux I'll upgrade to the latest version but I don't think there's any indication that this will be fixed. :(
As is often the case I bet ipv6 is to blame here. Here is a test people can run to verify, but please be careful :) Find the ipv6.ko module for the kernel you are running, and move it out of the way (for example "mv ipv6.ko ipv6.ko.BAK") then reboot. See if that makes the problem go away. If not, then bonding is likely the next thing that should be investigated.
Unfortunately, I am not certain I'll be able to test this any time soon. The machines where I was regularly observing this were ones I worked on at a prior job. That machine wasn't using IPv6, but it did have the module loaded. No bonding was in use. I do believe I just recently saw this with a CentOS machine though which used tagged VLANs on a bonded (active-passive) interface. I'll see if I can find an opportunity to reproduce it and narrow it down. If someone else on this ticket reliably sees this failure and can do this test I'd appreciate it.
Created attachment 150261 [details] bond-2.6.13-ref-leak.patch
Archana please post the patch for inclusion, it looks fine.
This patch seems fine -- I'll add it to my test kernels. commit ed4b9f8014db4f343e89b44b7c5ca355f439ce36 Author: Jay Vosburgh <fubar.com> Date: Wed Sep 14 14:52:09 2005 -0700 [PATCH] bonding: plug reference count leak Bonding leaks route structures when the ARP monitor is configured to send probes over VLANs. Originally reported by Ian Abel <ian.abel>; his original fix was modified by Jay Vosburgh to correct coding style and to close a leak it missed. Signed-off-by: Jay Vosburgh <fubar.com> Signed-off-by: Jeff Garzik <jgarzik>
I just hit this problem again this morning on the latest RHEL4 kernel. Shutting down interface eth0.13: [ OK ] Shutting down interface eth0.15: unregister_netdevice: waiting for eth0.15 to b ecome free. Usage count = 1 unregister_netdevice: waiting for eth0.15 to become free. Usage count = 1 ...and so on.... Kernel info: 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:17:21 EST 2007 i686 athlon i386 GNU/Linux Note that I am *not* using the bonding driver, so this problem is not specific to bonding. These are just VLAN interfaces on top of an e1000 interface.
Thanks for pointing that out, Steve. Is this something you can pretty easily recreate or was this just a freak occurance?
(In reply to comment #28) > Thanks for pointing that out, Steve. Is this something you can pretty easily > recreate or was this just a freak occurance? Unfortunately I can't recreate it on demand, but it's been happening on one of my production firewalls about once a month for the past 4-5 months when the system tries to do an automatic reboot for software maintenance reasons.
Thanks for the feedback, Steve. Though it isn't a direct reproducer it helps to know this system is used as a firewall so the conntrack code might be problematic as well.
Bonding fix added to my test kernels available here: http://people.redhat.com/agospoda/#rhel4 I'll continue to investigate other possible patches, but please test this one against bonding configurations that are problematic.
I have disabled the IPv6 module on my problem system to see if that has any affect.
committed in stream U6 build 55.9. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
I think the -55 kernels may have actually fixed this for me, but I'm a little confused because I thought the patch that went in was only for the bonding driver, which I'm not using.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html