Bug 903881
Summary: | [abrt]: BUG: scheduling while atomic: kworker/u:0/3171/0x10000100 | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | nathaniel <ntdoherty> | ||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 17 | CC: | fernandodarochafernandesjunior, gansalmon, itamar, jonathan, jwboyer, kernel-maint, larry.finger, madhu.chinakonda, schaiba, sgruszka | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | abrt_hash:2aa13bb567bca0c5e977160d50dc0bf406811dee | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-08 16:55:14 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
nathaniel
2013-01-25 00:44:21 UTC
It looks like the mutex activity in rtl_lps_leave is getting hit from the tasklet handler. Can the rtl_lps_leave be queued to a different context? I just checked the vendor driver and they use a spinlock_irq_save() call rather than a mutex_lock() call. The part that bothers me about their code is that they enable device interrupts in the middle of the routine with a "FIX_ME" comment. Is it legal to do that? While my question is being answered, I'll prepare a patch to use a spinlock rather than the mutex. I don't have access to a i686 system here. Could someone use objdump to tell me which line in the source file corresponds to rtl_lps_leave+0x23? Thanks. No i686 here either right now, hopefully nathaniel can do it? Probably easiest is to install the debuginfo: yum install kernel-debuginfo And then the following cd /lib/modules/3.6.11-5.fc17.i686/kernel/drivers/net/wireless/rtlwifi/ gdb rtlwifi.ko list *(rtl_lps_leave+0x23) What is the output? The spinlock was changed to a mutex with commit 6539306b2c3ceafbc4094cf68c58094c282da053 Author: Stanislaw Gruszka <sgruszka> Date: Mon Dec 12 12:43:24 2011 +0100 The change was made to reduce the time that interrupts were disabled. I added Stanislaw to the Cc list. Created attachment 688180 [details]
Test patch for this oops
With kernel commits 41affd5286fb91176eb99b34ecd8eb522ba22369 and 6539306b2c3ceafbc4094cf68c58094c282da053, the locking in rtl_lps_leave() was changed from a spinlock to a mutex. This oops indicates that routine rtl_is_special(), which calls rtl_lps_leave() in two places was entered in atomic mode. These two calls are replaced by putting a request on the appropriate work queue.
As I do not see this bug, please test.
I lunched kernel build with patch from comment 6 here: http://koji.fedoraproject.org/koji/taskinfo?taskID=4908318 nathaniel please test it when it finish to compile. would running: cd /lib/modules/3.6.11-5.fc17.i686/kernel/drivers/net/wireless/rtlwifi/ gdb rtlwifi.ko list *(rtl_lps_leave+0x23) still be helpful? Stanislaw, are you asking me to run the test I have in this message, or do you want me to install from your last link and test then? (In reply to comment #8) > would running: > > cd /lib/modules/3.6.11-5.fc17.i686/kernel/drivers/net/wireless/rtlwifi/ > gdb rtlwifi.ko > list *(rtl_lps_leave+0x23) > > still be helpful? No, it's not needed. > Stanislaw, are you asking me to run the test I have in > this message, or do you want me to install from your last link and test then? Please install and boot kernel from the link. Then test wireless network (for example using yum, but can be anything else actually) to check if problem gone. So ... is the problem reproducible with the test kernel? Sorry for delays. This is my primary computer, so I was waiting for a weekend work break to do tests. The problem isn't happening right now, hasn't happened since an update came through resetting the kernel (I think). Anyway, will run backup and then text the kernel from the link in comment 7. (In reply to comment #11) > Sorry for delays. This is my primary computer, so I was waiting for a > weekend work break to do tests. The problem isn't happening right now, > hasn't happened since an update came through resetting the kernel (I think). > > Anyway, will run backup and then text the kernel from the link in comment 7. Will run backup and then TEST the kernel from, etc... Nathaniel: What kernel have you been running since the failures stopped? Stanislaw: What might have been updated recently in Fedora 17 that could have fixed this? I think this change is correct, and worth pushing as a bug fix, but I certainly would like to reference this bug. Do you agree with my analysis and the fix? Larry: I am running kernel-3.6.11-6.bz903881.fc16.i686.rpm (that's the output from uname -a, anyway) It seems that I'm already running what Stanislaw asked me to run...So then your patch worked, Stanislaw, because I'm using my wireless right now and there are no problems. If I'm misinterpreting the above, I shall: 1- Follow link from comment 7 2- Follow link to i686 version next to "Descendants" and under "build" 3- and then install from link "kernel-3.6.11-6.bz903881.fc16.i686.rpm" next to "Output" Correct? Sorry, I am not very experienced in these matters. I'm no expert on Fedora naming conventions; however, as the bz903881 points to this bugzilla entry, I think that is the kernel with the patch added. Thanks for testing. Is it OK to give you credit for reporting the bug, and testing the fix? From your E-mail address, I think your full name is Nathaniel Doherty. That is the correct full name. It's okay to give me credit, since apparently I tested without being aware of it :D Anyway, many thanks to Stanislaw and you, Larry, for the fix. Everythign is working swimmingly now. (In reply to comment #13) > Stanislaw: What might have been updated recently in Fedora 17 that could > have fixed this? I think this change is correct, and worth pushing as a bug > fix, but I certainly would like to reference this bug. Do you agree with my > analysis and the fix? Yes, patch looks correct for me and now it was tested by nathaniel, so please push patch upstream. Thanks. Larry posted patch here: http://marc.info/?l=linux-wireless&m=135984210626235&w=2 Josh, please apply it as fix for this bug. Applied to all branches. Thanks! kernel-3.7.6-201.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.7.6-201.fc18 kernel-3.7.6-102.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/kernel-3.7.6-102.fc17 Package kernel-3.7.6-201.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.7.6-201.fc18' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-1961/kernel-3.7.6-201.fc18 then log in and leave karma (feedback). kernel-3.7.6-201.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. kernel-3.7.6-102.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. |