Description of problem: Both trying to restart a xen box (with no guests running) and trying to restart xen hvm guests results in the following message every 10 seconds for all eternity: unregister_netdevice: waiting for tap1 to become free. Usage count = 1 Version-Release number of selected component (if applicable): kernel-xen-2.6.18-58.el5 Additional info: I've been seeing this on ia64, but clalance tells me he's hit it on x86 recently as well.
Actually, two slightly different errors... On restarting the box, its virbr0 in the message, opposed to tap*. That's the same thing Chris saw. Shutting down hvm domains is what leads to the tap* version for me. Also of interest on the console during shutdown is the fact that libvirtd failed to shut down for some reason...
OK, testing locally on i686 reveals: kernel 2.6.18-58 - reboots fine kernel 2.6.18-59 - fails to reboot with similar message kernel 2.6.18-58 w/ 2.6.18-59 HV - reboots fine So the problem is clearly in the kernel, not the HV. Chris Lalancette
We have a winner: linux-2.6-net-ipv6-backport-optimistic-dad.patch - [net] ipv6: backport optimistic DAD (Neil Horman ) [246723] Kernel build w/everything in -59 minus that patch eliminates the libvirtd shutdown failures for me. Off to flag that bug and get Neil's attention... :)
Can't anything ever be easy.... Best guess is that something about the tap driver sends us through a path that takes a reference to the interface but doesn't release it. Visual inspection says that the most likely candidate is in ndisc_send_rs in the 'if (send_sllao)' clause. Its been awhile since I wrote this (months in fact). Jarod, can you modify the kernel such that the clause in question looks like this: ====================================================== if (send_sllao) { ifp = ipv6_get_ifaddr(saddr, dev, 1);. if (ifp) { if (ifp->flags & IFA_F_OPTIMISTIC) { send_sllao=0; } in6_ifa_put(ifp); } else { send_sllao = 0; } } ======================================== that should force the reference count to the ipv6 address to be decremented, which seems like it should be the case anyway. In fact, I'm sure thats it. I if (send_sllao) { ifp = ipv6_get_ifaddr(saddr, dev, 1); if (ifp) { if (ifp->flags & IFA_F_OPTIMISTIC) { send_sllao=0; in6_ifa_put(ifp); } } else { send_sllao = 0; } } That should ensure that the refcount on the interface always gets decremented. In fact I'm sure thats it. I remember that had to be fixed several weeks ago upstream, and I never backported the fix. Please confirm that, and I'll post the fix against this bug. Thanks!
Building a test kernel right now, should be able to verify the fix within the hour...
Fix confirmed, thanks Neil!
in 2.6.18-62.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
I am not sure if this is the same bug, but I see something similar in the latest rawhide kernels (kernel-PAE-2.6.24-0.167.rc8.git4.fc9 does it, not sure where it began, but definitely within the last two weeks). Could that be the same bug or something different?
Could have been the same breakage in the same netpoll code, but Neil has got that fixed upstream already... If its still happening, I'd file a new bug.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html