423791 – unregister_netdevice: waiting for tap1 to become free. Usage count = 1

Bug 423791 - unregister_netdevice: waiting for tap1 to become free. Usage count = 1

Summary: unregister_netdevice: waiting for tap1 to become free. Usage count = 1

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:	246723
Blocks:
TreeView+	depends on / blocked

Reported:	2007-12-13 18:04 UTC by Jarod Wilson
Modified:	2008-07-02 12:09 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHBA-2008-0314
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 15:03:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0314	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5.2	2008-05-20 18:43:34 UTC

Description Jarod Wilson 2007-12-13 18:04:04 UTC

Description of problem:
Both trying to restart a xen box (with no guests running) and trying to restart
xen hvm guests results in the following message every 10 seconds for all eternity:

unregister_netdevice: waiting for tap1 to become free. Usage count = 1

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-58.el5

Additional info:
I've been seeing this on ia64, but clalance tells me he's hit it on x86 recently
as well.

Comment 1 Jarod Wilson 2007-12-13 18:37:44 UTC

Actually, two slightly different errors... On restarting the box, its virbr0 in
the message, opposed to tap*. That's the same thing Chris saw. Shutting down hvm
domains is what leads to the tap* version for me.

Also of interest on the console during shutdown is the fact that libvirtd failed
to shut down for some reason...

Comment 2 Chris Lalancette 2007-12-13 19:06:26 UTC

OK, testing locally on i686 reveals:

kernel 2.6.18-58 - reboots fine
kernel 2.6.18-59 - fails to reboot with similar message
kernel 2.6.18-58 w/ 2.6.18-59 HV - reboots fine

So the problem is clearly in the kernel, not the HV.

Chris Lalancette

Comment 3 Jarod Wilson 2007-12-13 23:50:02 UTC

We have a winner:

linux-2.6-net-ipv6-backport-optimistic-dad.patch
- [net] ipv6: backport optimistic DAD (Neil Horman ) [246723]

Kernel build w/everything in -59 minus that patch eliminates the libvirtd shutdown failures for me. Off to 
flag that bug and get Neil's attention... :)

Comment 4 Neil Horman 2007-12-14 03:41:00 UTC

Can't anything ever be easy....

Best guess is that something about the tap driver sends us through a path that
takes a reference to the interface but doesn't release it.  Visual inspection
says that the most likely candidate is in ndisc_send_rs in the 'if (send_sllao)'
clause.  Its been awhile since I wrote this (months in fact).

Jarod, can you modify the kernel such that the clause in question looks like this:
======================================================
 if (send_sllao) {
                ifp = ipv6_get_ifaddr(saddr, dev, 1);.
                if (ifp) {
                        if (ifp->flags & IFA_F_OPTIMISTIC)  {
                                send_sllao=0;
                        }
                        in6_ifa_put(ifp);
                } else {
                        send_sllao = 0;
                }
        }

========================================

that should force the reference count to the ipv6 address to be decremented,
which seems like it should be the case anyway.  In fact, I'm sure thats it.  I 
if (send_sllao) {
                ifp = ipv6_get_ifaddr(saddr, dev, 1);
                if (ifp) {
                        if (ifp->flags & IFA_F_OPTIMISTIC)  {
                                send_sllao=0;
                                in6_ifa_put(ifp);
                        }
                } else {
                        send_sllao = 0;
                }
        }

That should ensure that the refcount on the interface always gets decremented.

In fact I'm sure thats it.  I remember that had to be fixed several weeks ago
upstream, and I never backported the fix.  Please confirm that, and I'll post
the fix against this bug.  Thanks!

Comment 5 Jarod Wilson 2007-12-14 14:20:22 UTC

Building a test kernel right now, should be able to verify the fix within the hour...

Comment 6 Jarod Wilson 2007-12-14 15:16:12 UTC

Fix confirmed, thanks Neil!

Comment 8 Don Zickus 2007-12-21 20:18:44 UTC

in 2.6.18-62.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 10 Ralf Ertzinger 2008-01-26 16:56:08 UTC

I am not sure if this is the same bug, but I see something similar in the latest
rawhide kernels (kernel-PAE-2.6.24-0.167.rc8.git4.fc9 does it, not sure where it
began, but definitely within the last two weeks). Could that be the same bug or
something different?

Comment 11 Jarod Wilson 2008-02-12 05:28:11 UTC

Could have been the same breakage in the same netpoll code, but Neil has got
that fixed upstream already... If its still happening, I'd file a new bug.

Comment 13 errata-xmlrpc 2008-05-21 15:03:55 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.