Bug 903881 - [abrt]: BUG: scheduling while atomic: kworker/u:0/3171/0x10000100
Summary: [abrt]: BUG: scheduling while atomic: kworker/u:0/3171/0x10000100
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: i686
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:2aa13bb567bca0c5e977160d50d...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-25 00:44 UTC by nathaniel
Modified: 2013-02-16 01:19 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-02-08 16:55:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Test patch for this oops (1.03 KB, patch)
2013-01-26 22:36 UTC, Larry Finger
no flags Details | Diff

Description nathaniel 2013-01-25 00:44:21 UTC
Description of problem:
I had just installed:Package xorg-x11-server-1.12.4-3.fc17
from repository: Fedora 17 testing repository,

complete xorg crash a few minutes later, hard reboot resulted in kernel panic, further hard reboot started normally - abrt reported kernel tainting on boot - xorg crashed again for unknown reason. I will be removing the package and removing the 'testing' repository. Not sure if the crashes are caused by Package xorg-x11-server-1.12.4-3.fc17 or one of the other several things downloaded when I updated yum after adding 'testing' repository.


Additional info:
libreport version: 2.0.18
abrt_version:   2.0.18
cmdline:        BOOT_IMAGE=/vmlinuz-3.6.11-5.fc17.i686 root=/dev/mapper/vg_toshibabeefymiracle2-lv_root ro rd.md=0 rd.dm=0 SYSFONT=True rd.lvm.lv=vg_toshibabeefymiracle2/lv_root KEYTABLE=us rd.lvm.lv=vg_toshibabeefymiracle2/lv_swap rd.luks=0 LANG=en_US.UTF-8 rhgb quiet
kernel:         3.6.11-5.fc17.i686

backtrace:
:BUG: scheduling while atomic: kworker/u:0/3171/0x10000100
:Modules linked in: fuse bnep bluetooth ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media arc4 snd_hda_codec_realtek rtl8192ce rtlwifi rtl8192c_common toshiba_acpi sparse_keymap wmi snd_hda_intel snd_hda_codec snd_hwdep iTCO_wdt iTCO_vendor_support r8169 mii mac80211 snd_seq cfg80211 snd_seq_device snd_pcm rfkill lpc_ich i2c_i801 snd_page_alloc snd_timer snd soundcore coretemp microcode binfmt_misc uinput ums_realtek usb_storage i915 video i2c_algo_bit drm_kms_helper drm i2c_core
:Pid: 3171, comm: kworker/u:0 Not tainted 3.6.11-5.fc17.i686 #1
:Call Trace:
: [<c094d803>] __schedule_bug+0x52/0x5e
: [<c095477e>] __schedule+0x75e/0x770
: [<c0471ccd>] ? update_curr+0x13d/0x1f0
: [<c046fef0>] ? __enqueue_entity+0x70/0x80
: [<c0472c2b>] ? enqueue_entity+0xcb/0x4f0
: [<c0469ebb>] ? update_rq_clock+0x3b/0x290
: [<c046aacb>] __cond_resched+0x1b/0x30
: [<c0954806>] _cond_resched+0x26/0x30
: [<c0953608>] mutex_lock+0x18/0x40
: [<f7df0723>] rtl_lps_leave+0x23/0xb0 [rtlwifi]
: [<f7dec21f>] rtl_is_special_data+0xef/0x120 [rtlwifi]
: [<f7df0bdb>] rtl_tx_status+0x5b/0x140 [rtlwifi]
: [<f810bcd8>] ieee80211_tx_status+0x1f8/0xb10 [mac80211]
: [<c066c4ea>] ? radix_tree_lookup+0xa/0x10
: [<c04bad34>] ? irq_to_desc+0x14/0x20
: [<c085842e>] ? skb_dequeue+0x4e/0x70
: [<f810a928>] ieee80211_tasklet_handler+0x158/0x180 [mac80211]
: [<c0444ff3>] tasklet_action+0x53/0xb0
: [<c0444b1f>] __do_softirq+0x9f/0x1b0
: [<c0956050>] ? nmi_stack_correct+0x2f/0x34
: [<c0954806>] _cond_resched+0x26/0x30
: [<c0953608>] mutex_lock+0x18/0x40
: [<f7df0723>] rtl_lps_leave+0x23/0xb0 [rtlwifi]
: [<f7dec21f>] rtl_is_special_data+0xef/0x120 [rtlwifi]
: [<f7df0bdb>] rtl_tx_status+0x5b/0x140 [rtlwifi]
: [<f810bcd8>] ieee80211_tx_status+0x1f8/0xb10 [mac80211]
: [<c066c4ea>] ? radix_tree_lookup+0xa/0x10
: [<c04bad34>] ? irq_to_desc+0x14/0x20
: [<c085842e>] ? skb_dequeue+0x4e/0x70
: [<f810a928>] ieee80211_tasklet_handler+0x158/0x180 [mac80211]
: [<c0444ff3>] tasklet_action+0x53/0xb0
: [<c0444b1f>] __do_softirq+0x9f/0x1b0
: [<c0956050>] ? nmi_stack_correct+0x2f/0x34
: [<c0444a80>] ? local_bh_enable_ip+0x90/0x90
: [<c0444a80>] ? local_bh_enable_ip+0x90/0x90
: <IRQ>  [<c0444e8d>] ? irq_exit+0x9d/0xb0
: [<c0404b3b>] ? do_IRQ+0x4b/0xc0
: [<c095c970>] ? common_interrupt+0x30/0x38
: [<c046007b>] ? hrtimer_try_to_cancel+0x9b/0xd0
: [<c04400e0>] ? cpu_maps_update_begin+0x20/0x20
: [<c08644e9>] ? __netif_schedule+0x59/0x70
: [<f8112664>] ? ieee80211_offchannel_return+0xa4/0x1c0 [mac80211]
: [<f8110c68>] ? __ieee80211_scan_completed+0x198/0x290 [mac80211]
: [<c0954390>] ? __schedule+0x370/0x770
: [<f81117e2>] ? ieee80211_scan_work+0x62/0x4e0 [mac80211]
: [<c0455a60>] ? process_one_work+0x120/0x3e0
: [<c095c970>] ? common_interrupt+0x30/0x38
: [<f8111780>] ? ieee80211_run_deferred_scan+0x70/0x70 [mac80211]
: [<c04573b1>] ? worker_thread+0x111/0x3b0
: [<c04677fe>] ? complete+0x4e/0x60
: [<c04572a0>] ? manage_workers+0x2a0/0x2a0
: [<c045bd92>] ? kthread+0x72/0x80
: [<c045bd20>] ? kthread_freezable_should_stop+0x60/0x60
: [<c095c97e>] ? kernel_thread_helper+0x6/0x10

Comment 1 John W. Linville 2013-01-25 16:12:29 UTC
It looks like the mutex activity in rtl_lps_leave is getting hit from the tasklet handler.  Can the rtl_lps_leave be queued to a different context?

Comment 2 Larry Finger 2013-01-25 16:39:06 UTC
I just checked the vendor driver and they use a spinlock_irq_save() call rather than a mutex_lock() call. The part that bothers me about their code is that they enable device interrupts in the middle of the routine with a "FIX_ME" comment. Is it legal to do that?

While my question is being answered, I'll prepare a patch to use a spinlock rather than the mutex.

I don't have access to a i686 system here. Could someone use objdump to tell me which line in the source file corresponds to rtl_lps_leave+0x23?

Thanks.

Comment 3 John W. Linville 2013-01-25 17:35:40 UTC
No i686 here either right now, hopefully nathaniel can do it?

Probably easiest is to install the debuginfo:

  yum install kernel-debuginfo

And then the following

  cd /lib/modules/3.6.11-5.fc17.i686/kernel/drivers/net/wireless/rtlwifi/
  gdb rtlwifi.ko
  list *(rtl_lps_leave+0x23)

What is the output?

Comment 4 Larry Finger 2013-01-25 18:09:33 UTC
The spinlock was changed to a mutex with

commit 6539306b2c3ceafbc4094cf68c58094c282da053
Author: Stanislaw Gruszka <sgruszka>
Date:   Mon Dec 12 12:43:24 2011 +0100

The change was made to reduce the time that interrupts were disabled.

Comment 5 Larry Finger 2013-01-25 18:10:19 UTC
I added Stanislaw to the Cc list.

Comment 6 Larry Finger 2013-01-26 22:36:16 UTC
Created attachment 688180 [details]
Test patch for this oops

With kernel commits 41affd5286fb91176eb99b34ecd8eb522ba22369 and 6539306b2c3ceafbc4094cf68c58094c282da053, the locking in rtl_lps_leave() was changed from a spinlock to a mutex. This oops indicates that routine rtl_is_special(), which calls rtl_lps_leave() in two places was entered in atomic mode. These two calls are replaced by putting a request on the appropriate work queue.

As I do not see this bug, please test.

Comment 7 Stanislaw Gruszka 2013-01-28 11:20:42 UTC
I lunched kernel build with patch from comment 6 here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=4908318

nathaniel please test it when it finish to compile.

Comment 8 nathaniel 2013-01-28 15:01:22 UTC
would running:

cd /lib/modules/3.6.11-5.fc17.i686/kernel/drivers/net/wireless/rtlwifi/
  gdb rtlwifi.ko
  list *(rtl_lps_leave+0x23)

still be helpful? Stanislaw, are you asking me to run the test I have in this message, or do you want me to install from your last link and test then?

Comment 9 Stanislaw Gruszka 2013-01-28 15:31:02 UTC
(In reply to comment #8)
> would running:
> 
> cd /lib/modules/3.6.11-5.fc17.i686/kernel/drivers/net/wireless/rtlwifi/
>   gdb rtlwifi.ko
>   list *(rtl_lps_leave+0x23)
> 
> still be helpful? 
No, it's not needed.

> Stanislaw, are you asking me to run the test I have in
> this message, or do you want me to install from your last link and test then?
Please install and boot kernel from the link. Then test wireless network (for example using yum, but can be anything else actually) to check if problem gone.

Comment 10 Stanislaw Gruszka 2013-02-01 08:09:59 UTC
So ... is the problem reproducible with the test kernel?

Comment 11 nathaniel 2013-02-02 19:43:33 UTC
Sorry for delays. This is my primary computer, so I was waiting for a weekend work break to do tests. The problem isn't happening right now, hasn't happened since an update came through resetting the kernel (I think).

Anyway, will run backup and then text the kernel from the link in comment 7.

Comment 12 nathaniel 2013-02-02 19:44:41 UTC
(In reply to comment #11)
> Sorry for delays. This is my primary computer, so I was waiting for a
> weekend work break to do tests. The problem isn't happening right now,
> hasn't happened since an update came through resetting the kernel (I think).
> 
> Anyway, will run backup and then text the kernel from the link in comment 7.

Will run backup and then TEST the kernel from, etc...

Comment 13 Larry Finger 2013-02-02 19:51:26 UTC
Nathaniel: What kernel have you been running since the failures stopped?

Stanislaw: What might have been updated recently in Fedora 17 that could have fixed this? I think this change is correct, and worth pushing as a bug fix, but I certainly would like to reference this bug. Do you agree with my analysis and the fix?

Comment 14 nathaniel 2013-02-02 20:02:54 UTC
Larry: I am running kernel-3.6.11-6.bz903881.fc16.i686.rpm 
(that's the output from uname -a, anyway)

It seems that I'm already running what Stanislaw asked me to run...So then your patch worked, Stanislaw, because I'm using my wireless right  now and there are no problems.


If I'm misinterpreting the above, I shall:
1- Follow link from comment 7
2- Follow link to i686 version next to "Descendants" and under "build"
3- and then install from link "kernel-3.6.11-6.bz903881.fc16.i686.rpm" next to "Output"

Correct? 

Sorry, I am not very experienced in these matters.

Comment 15 Larry Finger 2013-02-02 20:27:36 UTC
I'm no expert on Fedora naming conventions; however, as the bz903881 points to this bugzilla entry, I think that is the kernel with the patch added.

Thanks for testing. Is it OK to give you credit for reporting the bug, and testing the fix? From your E-mail address, I think your full name is Nathaniel Doherty.

Comment 16 nathaniel 2013-02-02 20:37:59 UTC
That is the correct full name. It's okay to give me credit, since apparently I tested without being aware of it :D

Anyway, many thanks to Stanislaw and you, Larry, for the fix. Everythign is working swimmingly now.

Comment 17 Stanislaw Gruszka 2013-02-04 09:00:52 UTC
(In reply to comment #13)
> Stanislaw: What might have been updated recently in Fedora 17 that could
> have fixed this? I think this change is correct, and worth pushing as a bug
> fix, but I certainly would like to reference this bug. Do you agree with my
> analysis and the fix?

Yes, patch looks correct for me and now it was tested by nathaniel, so please push patch upstream. Thanks.

Comment 18 Stanislaw Gruszka 2013-02-04 09:52:25 UTC
Larry posted patch here:
http://marc.info/?l=linux-wireless&m=135984210626235&w=2
Josh, please apply it as fix for this bug.

Comment 19 Josh Boyer 2013-02-04 15:01:47 UTC
Applied to all branches.  Thanks!

Comment 20 Fedora Update System 2013-02-04 21:51:11 UTC
kernel-3.7.6-201.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.7.6-201.fc18

Comment 21 Fedora Update System 2013-02-04 21:56:51 UTC
kernel-3.7.6-102.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.7.6-102.fc17

Comment 22 Fedora Update System 2013-02-05 16:56:04 UTC
Package kernel-3.7.6-201.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.7.6-201.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-1961/kernel-3.7.6-201.fc18
then log in and leave karma (feedback).

Comment 23 Fedora Update System 2013-02-08 16:55:17 UTC
kernel-3.7.6-201.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2013-02-16 01:19:54 UTC
kernel-3.7.6-102.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.