Bug 219897
| Summary: | Unmatched decrementing of net device reference count | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Glauber Costa <gcosta> | ||||
| Component: | kernel-xen | Assignee: | Glauber Costa <gcosta> | ||||
| Status: | CLOSED ERRATA | QA Contact: | |||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 5.0 | CC: | dzickus, poelstra, xen-maint | ||||
| Target Milestone: | --- | Keywords: | OtherQA | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2007-11-07 19:17:27 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 222399, 227613 | ||||||
| Attachments: |
|
||||||
I played with this a bit, hoping it would be an easy fix, but right now, it seems to be very tricky, so It may be delayed, as it's not exactly a priority. For tracking purposes, here's what I've found so far: Everything goes fine with attaching and detaching. But assynchronoysly, there are calls happening to linkwatch_run_queue(). It puts a reference, so devices that are on the work queue for this are expected to hold one. However, (probably) due to cache issues, alloc_netdev may take an address already previously used for a netdevice. It may happen that the device was still waiting in that work queue to have its reference put... The disaster scenario is that references sums zero (all matched). But putting the reference for the old structure alloc'ed in the same memory area makes it -1. Question is: How does it happen? AFAIK, there's no calls to kfree in this call path, so how can kzalloc return twice the same address? Still a mistery ;-) Created attachment 144179 [details]
upstream fix
this is upstream fix for the problem. As it gets rid of any processing besides
a state change while getting a XenbusStateClosing transition, it also fix
#219563.
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST. A fix for this issue should have been included in the packages contained in the RHEL5.1-Snapshot3 on partners.redhat.com. Requested action: Please verify that your issue is fixed as soon as possible to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. More assistance: If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. A fix for this issue should have been included in the packages contained in the RHEL5.1-Snapshot4 on partners.redhat.com. Requested action: Please verify that your issue is fixed *as soon as possible* to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |
Description of problem: Within a series of runs of xm network-attach and xm-network attach, the kernel reports: unregister_netdevice: waiting for eth1 to become free. Usage count = -1 How reproducible: Sometimes. Steps to Reproduce: 1. attach and detach network interfaces very quickly. The same script used in #219563 can be used: for i in $(seq 1000); do xm network-attach <domid>; xm network-detach <domid> $i; done Actual results: Network is rendered unusable, and we wait forever to free a negative reference count. Expected results: Everything works. Additional info: