Bug 542593
Summary: | recursive lock of devlist_mtx | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Stanislaw Gruszka <sgruszka> | ||||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.5 | CC: | cmeadors, linville, shillman | ||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||
Target Release: | 5.5 | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-03-30 07:44:30 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 526948 | ||||||||
Attachments: |
|
Description
Stanislaw Gruszka
2009-11-30 09:55:25 UTC
OK, I think I caused this in the backport...see the references to cleanup_work in net/wireless/core.c? I commented-out the original code there because I was getting a hang on the call to cancel_work_sync when the device was first brought-up. IIRC I determined that this was due to the work having never been scheduled, and I wasn't sure how to determine that. I'll poke at that...maybe if I can avoid that cancel_work_syn on the first NETDEV_UP then the old code can be used. (In reply to comment #1) > OK, I think I caused this in the backport...see the references to cleanup_work > in net/wireless/core.c? Yes. I have no clean solution for that as we have no cancel_work_sync() with return value equivalent in RHEL. > I commented-out the original code there because I was getting a hang on the > call to cancel_work_sync when the device was first brought-up. IIRC I > determined that this was due to the work having never been scheduled, and I > wasn't sure how to determine that. I'll poke at that...maybe if I can avoid > that cancel_work_syn on the first NETDEV_UP then the old code can be used. If it can take lots of time for you, I can work on it, since I'm able to reproduce. Created attachment 374806 [details]
jwltest-wireless-cleanup_work.patch
Does this avoid the deadlock?
Yes, patch fix the deadlock. Fixed in kernel-2.6.18-175.el5.jwltest.95.3.i686.rpm Yeah, but it seems to leak a reference to the netdevice, making it impossible to successfully remove the module... I'm going to pull this out of the patches I post today...I'm sure I can find a fix, but may need to work the exception process... :-( This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 375286 [details]
jwltest-wireless-cleanup_work.patch
This seems to avoid the rfkill lockup _and_ still allow for the netdev to close... :-)
Patch works for me as well. I tested using: for ((i = 0; i < 10; i++)) ; do ifconfig wlan0 down ifconfig wlan0 up done rmmod iwl3945 Without the patch rmmod fail. QA_ACK 5.5 RHEL Looks like this was caused by enabling other priority hardware. Can't regress functionality. Reproducer is in initial description and we have the hardware. in kernel-2.6.18-181.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |