Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 219897

Summary: Unmatched decrementing of net device reference count
Product: Red Hat Enterprise Linux 5 Reporter: Glauber Costa <gcosta>
Component: kernel-xenAssignee: Glauber Costa <gcosta>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: dzickus, poelstra, xen-maint
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0959 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-07 19:17:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 222399, 227613    
Attachments:
Description Flags
upstream fix none

Description Glauber Costa 2006-12-16 01:33:06 UTC
Description of problem:

Within a series of runs of xm network-attach and xm-network attach, the kernel
reports:

   unregister_netdevice: waiting for eth1 to become free. Usage count = -1

How reproducible:

Sometimes.

Steps to Reproduce:
1. attach and detach network interfaces very quickly. The same script used in
#219563 can be used:
   for i in $(seq 1000); 
   do 
     xm network-attach <domid>;  
     xm network-detach <domid> $i;
   done
  
Actual results:
Network is rendered unusable, and we wait forever to free a negative reference
count.

Expected results:

Everything works.
Additional info:

Comment 1 Glauber Costa 2006-12-16 01:47:33 UTC
I played with this a bit, hoping it would be an easy fix, but right now, it
seems to be very tricky, so It may be delayed, as it's not exactly a priority. 

For tracking purposes, here's what I've found so far:

Everything goes fine with attaching and detaching. But assynchronoysly, there
are calls happening to linkwatch_run_queue(). It puts a reference, so devices
that are on the work queue for this are expected to hold one. However,
(probably) due to cache issues, alloc_netdev may take an address already
previously used for a netdevice. It may happen that the device was still waiting
in that work queue to have its reference put... 

The disaster scenario is that references sums zero (all matched). But putting
the reference for the old structure alloc'ed in the same memory area makes it -1. 

Question is: How does it happen? AFAIK, there's no calls to kfree in this call
path, so how can kzalloc return twice the same address?

Still a mistery ;-)

Comment 2 Glauber Costa 2006-12-21 13:57:03 UTC
Created attachment 144179 [details]
upstream fix

this is upstream fix for the problem. As it gets rid of any processing besides
a state change while getting a XenbusStateClosing transition, it also fix
#219563.

Comment 4 RHEL Program Management 2007-03-15 02:42:40 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 7 John Poelstra 2007-08-27 18:20:34 UTC
A fix for this issue should have been included in the packages contained in the
RHEL5.1-Snapshot3 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed as soon as possible to
ensure that it is included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

More assistance: If you cannot access bugzilla, please reply with a message to
Issue Tracker and I will change the status for you.  If you need assistance
accessing ftp://partners.redhat.com, please contact your Partner Manager.

Comment 8 John Poelstra 2007-08-31 00:27:30 UTC
A fix for this issue should have been included in the packages contained in the
RHEL5.1-Snapshot4 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed *as soon as possible*
to ensure that it is included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.


Comment 10 errata-xmlrpc 2007-11-07 19:17:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html