Bug 423791 - unregister_netdevice: waiting for tap1 to become free. Usage count = 1
unregister_netdevice: waiting for tap1 to become free. Usage count = 1
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.2
All Linux
low Severity low
: rc
: ---
Assigned To: Neil Horman
Martin Jenner
: Regression
Depends On: 246723
Blocks:
  Show dependency treegraph
 
Reported: 2007-12-13 13:04 EST by Jarod Wilson
Modified: 2008-07-02 08:09 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:03:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jarod Wilson 2007-12-13 13:04:04 EST
Description of problem:
Both trying to restart a xen box (with no guests running) and trying to restart
xen hvm guests results in the following message every 10 seconds for all eternity:

unregister_netdevice: waiting for tap1 to become free. Usage count = 1

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-58.el5

Additional info:
I've been seeing this on ia64, but clalance tells me he's hit it on x86 recently
as well.
Comment 1 Jarod Wilson 2007-12-13 13:37:44 EST
Actually, two slightly different errors... On restarting the box, its virbr0 in
the message, opposed to tap*. That's the same thing Chris saw. Shutting down hvm
domains is what leads to the tap* version for me.

Also of interest on the console during shutdown is the fact that libvirtd failed
to shut down for some reason...
Comment 2 Chris Lalancette 2007-12-13 14:06:26 EST
OK, testing locally on i686 reveals:

kernel 2.6.18-58 - reboots fine
kernel 2.6.18-59 - fails to reboot with similar message
kernel 2.6.18-58 w/ 2.6.18-59 HV - reboots fine

So the problem is clearly in the kernel, not the HV.

Chris Lalancette
Comment 3 Jarod Wilson 2007-12-13 18:50:02 EST
We have a winner:

linux-2.6-net-ipv6-backport-optimistic-dad.patch
- [net] ipv6: backport optimistic DAD (Neil Horman ) [246723]

Kernel build w/everything in -59 minus that patch eliminates the libvirtd shutdown failures for me. Off to 
flag that bug and get Neil's attention... :)
Comment 4 Neil Horman 2007-12-13 22:41:00 EST
Can't anything ever be easy....

Best guess is that something about the tap driver sends us through a path that
takes a reference to the interface but doesn't release it.  Visual inspection
says that the most likely candidate is in ndisc_send_rs in the 'if (send_sllao)'
clause.  Its been awhile since I wrote this (months in fact).

Jarod, can you modify the kernel such that the clause in question looks like this:
======================================================
 if (send_sllao) {
                ifp = ipv6_get_ifaddr(saddr, dev, 1);.
                if (ifp) {
                        if (ifp->flags & IFA_F_OPTIMISTIC)  {
                                send_sllao=0;
                        }
                        in6_ifa_put(ifp);
                } else {
                        send_sllao = 0;
                }
        }

========================================

that should force the reference count to the ipv6 address to be decremented,
which seems like it should be the case anyway.  In fact, I'm sure thats it.  I 
if (send_sllao) {
                ifp = ipv6_get_ifaddr(saddr, dev, 1);
                if (ifp) {
                        if (ifp->flags & IFA_F_OPTIMISTIC)  {
                                send_sllao=0;
                                in6_ifa_put(ifp);
                        }
                } else {
                        send_sllao = 0;
                }
        }

That should ensure that the refcount on the interface always gets decremented.

In fact I'm sure thats it.  I remember that had to be fixed several weeks ago
upstream, and I never backported the fix.  Please confirm that, and I'll post
the fix against this bug.  Thanks!
Comment 5 Jarod Wilson 2007-12-14 09:20:22 EST
Building a test kernel right now, should be able to verify the fix within the hour...
Comment 6 Jarod Wilson 2007-12-14 10:16:12 EST
Fix confirmed, thanks Neil!
Comment 8 Don Zickus 2007-12-21 15:18:44 EST
in 2.6.18-62.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 10 Ralf Ertzinger 2008-01-26 11:56:08 EST
I am not sure if this is the same bug, but I see something similar in the latest
rawhide kernels (kernel-PAE-2.6.24-0.167.rc8.git4.fc9 does it, not sure where it
began, but definitely within the last two weeks). Could that be the same bug or
something different?
Comment 11 Jarod Wilson 2008-02-12 00:28:11 EST
Could have been the same breakage in the same netpoll code, but Neil has got
that fixed upstream already... If its still happening, I'd file a new bug.
Comment 13 errata-xmlrpc 2008-05-21 11:03:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.