Bug 219897 - Unmatched decrementing of net device reference count
Unmatched decrementing of net device reference count
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Glauber Costa
: OtherQA
Depends On:
Blocks: 222399 227613
  Show dependency treegraph
 
Reported: 2006-12-15 20:33 EST by Glauber Costa
Modified: 2010-10-22 03:27 EDT (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 14:17:27 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
upstream fix (3.71 KB, patch)
2006-12-21 08:57 EST, Glauber Costa
no flags Details | Diff

  None (edit)
Description Glauber Costa 2006-12-15 20:33:06 EST
Description of problem:

Within a series of runs of xm network-attach and xm-network attach, the kernel
reports:

   unregister_netdevice: waiting for eth1 to become free. Usage count = -1

How reproducible:

Sometimes.

Steps to Reproduce:
1. attach and detach network interfaces very quickly. The same script used in
#219563 can be used:
   for i in $(seq 1000); 
   do 
     xm network-attach <domid>;  
     xm network-detach <domid> $i;
   done
  
Actual results:
Network is rendered unusable, and we wait forever to free a negative reference
count.

Expected results:

Everything works.
Additional info:
Comment 1 Glauber Costa 2006-12-15 20:47:33 EST
I played with this a bit, hoping it would be an easy fix, but right now, it
seems to be very tricky, so It may be delayed, as it's not exactly a priority. 

For tracking purposes, here's what I've found so far:

Everything goes fine with attaching and detaching. But assynchronoysly, there
are calls happening to linkwatch_run_queue(). It puts a reference, so devices
that are on the work queue for this are expected to hold one. However,
(probably) due to cache issues, alloc_netdev may take an address already
previously used for a netdevice. It may happen that the device was still waiting
in that work queue to have its reference put... 

The disaster scenario is that references sums zero (all matched). But putting
the reference for the old structure alloc'ed in the same memory area makes it -1. 

Question is: How does it happen? AFAIK, there's no calls to kfree in this call
path, so how can kzalloc return twice the same address?

Still a mistery ;-)
Comment 2 Glauber Costa 2006-12-21 08:57:03 EST
Created attachment 144179 [details]
upstream fix

this is upstream fix for the problem. As it gets rid of any processing besides
a state change while getting a XenbusStateClosing transition, it also fix
#219563.
Comment 4 RHEL Product and Program Management 2007-03-14 22:42:40 EDT
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.
Comment 7 John Poelstra 2007-08-27 14:20:34 EDT
A fix for this issue should have been included in the packages contained in the
RHEL5.1-Snapshot3 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed as soon as possible to
ensure that it is included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

More assistance: If you cannot access bugzilla, please reply with a message to
Issue Tracker and I will change the status for you.  If you need assistance
accessing ftp://partners.redhat.com, please contact your Partner Manager.
Comment 8 John Poelstra 2007-08-30 20:27:30 EDT
A fix for this issue should have been included in the packages contained in the
RHEL5.1-Snapshot4 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed *as soon as possible*
to ensure that it is included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.
Comment 10 errata-xmlrpc 2007-11-07 14:17:27 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.