Bug 423791

Summary: unregister_netdevice: waiting for tap1 to become free. Usage count = 1
Product: Red Hat Enterprise Linux 5 Reporter: Jarod Wilson <jarod>
Component: kernel-xenAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: nhorman, redhat-bugzilla, villapla, xen-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:03:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 246723    
Bug Blocks:    

Description Jarod Wilson 2007-12-13 18:04:04 UTC
Description of problem:
Both trying to restart a xen box (with no guests running) and trying to restart
xen hvm guests results in the following message every 10 seconds for all eternity:

unregister_netdevice: waiting for tap1 to become free. Usage count = 1

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-58.el5

Additional info:
I've been seeing this on ia64, but clalance tells me he's hit it on x86 recently
as well.

Comment 1 Jarod Wilson 2007-12-13 18:37:44 UTC
Actually, two slightly different errors... On restarting the box, its virbr0 in
the message, opposed to tap*. That's the same thing Chris saw. Shutting down hvm
domains is what leads to the tap* version for me.

Also of interest on the console during shutdown is the fact that libvirtd failed
to shut down for some reason...

Comment 2 Chris Lalancette 2007-12-13 19:06:26 UTC
OK, testing locally on i686 reveals:

kernel 2.6.18-58 - reboots fine
kernel 2.6.18-59 - fails to reboot with similar message
kernel 2.6.18-58 w/ 2.6.18-59 HV - reboots fine

So the problem is clearly in the kernel, not the HV.

Chris Lalancette

Comment 3 Jarod Wilson 2007-12-13 23:50:02 UTC
We have a winner:

linux-2.6-net-ipv6-backport-optimistic-dad.patch
- [net] ipv6: backport optimistic DAD (Neil Horman ) [246723]

Kernel build w/everything in -59 minus that patch eliminates the libvirtd shutdown failures for me. Off to 
flag that bug and get Neil's attention... :)

Comment 4 Neil Horman 2007-12-14 03:41:00 UTC
Can't anything ever be easy....

Best guess is that something about the tap driver sends us through a path that
takes a reference to the interface but doesn't release it.  Visual inspection
says that the most likely candidate is in ndisc_send_rs in the 'if (send_sllao)'
clause.  Its been awhile since I wrote this (months in fact).

Jarod, can you modify the kernel such that the clause in question looks like this:
======================================================
 if (send_sllao) {
                ifp = ipv6_get_ifaddr(saddr, dev, 1);.
                if (ifp) {
                        if (ifp->flags & IFA_F_OPTIMISTIC)  {
                                send_sllao=0;
                        }
                        in6_ifa_put(ifp);
                } else {
                        send_sllao = 0;
                }
        }

========================================

that should force the reference count to the ipv6 address to be decremented,
which seems like it should be the case anyway.  In fact, I'm sure thats it.  I 
if (send_sllao) {
                ifp = ipv6_get_ifaddr(saddr, dev, 1);
                if (ifp) {
                        if (ifp->flags & IFA_F_OPTIMISTIC)  {
                                send_sllao=0;
                                in6_ifa_put(ifp);
                        }
                } else {
                        send_sllao = 0;
                }
        }

That should ensure that the refcount on the interface always gets decremented.

In fact I'm sure thats it.  I remember that had to be fixed several weeks ago
upstream, and I never backported the fix.  Please confirm that, and I'll post
the fix against this bug.  Thanks!

Comment 5 Jarod Wilson 2007-12-14 14:20:22 UTC
Building a test kernel right now, should be able to verify the fix within the hour...

Comment 6 Jarod Wilson 2007-12-14 15:16:12 UTC
Fix confirmed, thanks Neil!

Comment 8 Don Zickus 2007-12-21 20:18:44 UTC
in 2.6.18-62.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 10 Ralf Ertzinger 2008-01-26 16:56:08 UTC
I am not sure if this is the same bug, but I see something similar in the latest
rawhide kernels (kernel-PAE-2.6.24-0.167.rc8.git4.fc9 does it, not sure where it
began, but definitely within the last two weeks). Could that be the same bug or
something different?

Comment 11 Jarod Wilson 2008-02-12 05:28:11 UTC
Could have been the same breakage in the same netpoll code, but Neil has got
that fixed upstream already... If its still happening, I'd file a new bug.

Comment 13 errata-xmlrpc 2008-05-21 15:03:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html