Created attachment 369619 [details] dmesg output for kernel-2.6.32-0.33.rc5.git1.fc13.x86_64 Description of problem: With a kernel from rawhide I see [ INFO: inconsistent lock state ] 2.6.32-0.33.rc5.git1.fc13.x86_64 #1 --------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. scsi_eh_1/192 [HC1[1]:SC0[0]:HE0:SE1] takes: (&host->lock){?.+...}, at: [<ffffffff813e71d3>] ata_sff_interrupt+0x36/0xbe {HARDIRQ-ON-W} state was registered at: [<ffffffff810ae1d0>] __lock_acquire+0x370/0xd5d [<ffffffff810aecb0>] lock_acquire+0xf3/0x12d [<ffffffff81564d8f>] _spin_lock+0x45/0x8e [<ffffffffa004cc23>] pdc_sata_hardreset+0x70/0xf4 [sata_promise] [<ffffffff813df8f5>] ata_do_reset+0x6c/0x8b [<ffffffff813e1195>] ata_eh_reset+0x511/0xcb9 [<ffffffff813e2af7>] ata_eh_recover+0x2d9/0xbd7 [<ffffffff813e3665>] ata_do_eh+0x5a/0xbb [<ffffffff813e3738>] ata_std_error_handler+0x72/0x91 [<ffffffffa004c71e>] pdc_error_handler+0x33/0x4e [sata_promise] [<ffffffff813e40d1>] ata_scsi_error+0x337/0x699 [<ffffffff813bb9f3>] scsi_error_handler+0x104/0x5bd [<ffffffff8109656d>] kthread+0x9a/0xa2 [<ffffffff810130ea>] child_rip+0xa/0x20 That is followed by bug 520186 which is still there. Full dmesg attached. Version-Release number of selected component (if applicable): kernel-2.6.32-0.33.rc5.git1.fc13.x86_64 How reproducible: on every boot so far
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
An update to 2.6.32-0.48.rc7.git1.fc13.x86_64 replaced scsi_eh_1/192 ... at ... ata_sff_interrupt+0x36/0xbe with swapper/0 ... at ... pdc_interrupt+0x3e/0x38d [sata_promise] Otherwise this looks pretty similar although "other info that might help us debug this" not really the same. A full dmesg output attached.
Created attachment 370217 [details] dmesg for 2.6.32-0.48.rc7.git1.fc13.x86_64 kernel
Created attachment 378109 [details] additional information from kernel 2.6.32-7.fc13.x86_64 ("no locks held by work_for_cpu/210") Once I got with the current rawhide kernel "no locks held by work_for_cpu/210" and a backtrace like attached. More typical this will be "no locks held by swapper/0" and something like from the next attachment
Created attachment 378110 [details] additional information from kernel 2.6.32-7.fc13.x86_64 ("no locks held by swapper/0")
This complaint seems to be gone at last in 2.6.34-...fc14.x86_64 kernels; or at least is not displayed. The last time I tried this was not the case with 2.6.33-... kernels as currently used for fc13.
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle. Changing version to '13'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
[ INFO: inconsistent lock state ] 2.6.33-1.fc13.i686.PAE #1 --------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. scsi_eh_5/262 [HC0[0]:SC0[0]:HE1:SE1] takes: (&(&host->lock)->rlock){?.-...}, at: [<f7e89845>] pdc_sata_hardreset+0x82/0xdc [sata_promise] {IN-HARDIRQ-W} state was registered at: [<c046a2ca>] __lock_acquire+0x266/0xb89 [<c046ac80>] lock_acquire+0x93/0xb1 [<c07c56d1>] _raw_spin_lock_irqsave+0x37/0x6a [<c06a3d16>] ata_sff_interrupt+0x19/0xad [<c048fdad>] handle_IRQ_event+0x4b/0xf6 [<c0491664>] handle_edge_irq+0xb1/0xf4 [<c040a79c>] handle_irq+0x40/0x4c [<c040a00b>] do_IRQ+0x46/0x9f [<c0409175>] common_interrupt+0x35/0x3c [<c040ee82>] default_idle+0x44/0x6e [<c0407dbe>] cpu_idle+0x9b/0xb5 [<c07c0752>] start_secondary+0x204/0x242 irq event stamp: 35 hardirqs last enabled at (35): [<c07c5d44>] _raw_spin_unlock_irqrestore+0x41/0x4d hardirqs last disabled at (34): [<c07c56ba>] _raw_spin_lock_irqsave+0x20/0x6a softirqs last enabled at (0): [<c0440a86>] copy_process+0x319/0x10b1 softirqs last disabled at (0): [<(null)>] (null) other info that might help us debug this: no locks held by scsi_eh_5/262.
Created attachment 447565 [details] trace from kernel-2.6.36-0.21.rc4.git1.fc15.x86_64 ("no locks held by udevd/85" this time) I did not see that for quite a while but with 2.6.36-0.21.rc4.git1.fc15.x86_64 it looks like that it returned (and it is quite easy to see that): inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. udevd/85 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&host->lock)->rlock){?.+...}, at: [<ffffffffa0047cf1>] pdc_interrupt+0x40/0x38e [sata_promise] udevd is from udev-161-2.fc15 and was installed on 2010/09/03 but complaints started to show up only with a new kernel. More data in an attachment.
Created attachment 449985 [details] somewhat different trace from 2.6.36-0.26.rc5.git4.fc15.x86_64 ("no locks held by swapper/0")
This issue was apparently absent in recent rawhide kernels. In particular I did not notice that with 2.6.36-5.fc15.x86_64 or 2.6.36.2-12.rc1.fc15.x86_64. After updating to 2.6.37-0.rc5.git2.1.fc15.x86_64 I see on every boot: [ 4.144783] ================================= [ 4.145006] [ INFO: inconsistent lock state ] [ 4.145006] 2.6.37-0.rc5.git2.1.fc15.x86_64 #1 [ 4.145006] --------------------------------- [ 4.145006] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [ 4.145006] work_for_cpu/226 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 4.145006] (&(&host->lock)->rlock){?.+...}, at: [<ffffffffa0047cf4>] pdc_interrupt+0x40/0x38e [sata_promise] ..... [ 4.145006] no locks held by work_for_cpu/226. ..... (and so on). Anybody interested in a full trace? Does not seem to be much different from what was recorded before.
This was not showing up with 2.6.38 kernels. With 2.6.39 (and microcode.ko module renamed to allow to boot at all) I see again: [ 3.740008] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [ 3.740008] scsi_eh_1/196 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 3.740008] (&(&host->lock)->rlock){?.+...}, at: [<ffffffff81354afd>] __ata_sff_interrupt+0x2b/0x17f [ 3.740008] {HARDIRQ-ON-W} state was registered at: [ 3.740008] [<ffffffff81085f37>] __lock_acquire+0x337/0xc81 [ 3.740008] [<ffffffff81086d11>] lock_acquire+0xd0/0xfb [ 3.740008] [<ffffffff814bc710>] _raw_spin_lock+0x36/0x6a [ 3.740008] [<ffffffffa000bb04>] pdc_sata_hardreset+0x5e/0xd5 [sata_promise] [ 3.740008] [<ffffffff8134e5fd>] ata_do_reset+0x59/0x62 [ 3.740008] [<ffffffff81351400>] ata_eh_reset+0x4e9/0xc49 [ 3.740008] [<ffffffff81351f3d>] ata_eh_recover+0x2ef/0x10d1 [ 3.740008] [<ffffffff8135342e>] ata_do_eh+0x4b/0x98 [ 3.740008] [<ffffffff81355469>] ata_sff_error_handler+0xe3/0xef [ 3.740008] [<ffffffffa000baa2>] pdc_error_handler+0x21/0x25 [sata_promise] [ 3.740008] [<ffffffff8135300e>] ata_scsi_port_error_handler+0x244/0x553 [ 3.740008] [<ffffffff813533b8>] ata_scsi_error+0x9b/0xc6 [ 3.740008] [<ffffffff8132f8b4>] scsi_error_handler+0x114/0x5a0 [ 3.740008] [<ffffffff81072d55>] kthread+0xa8/0xb0 [ 3.740008] [<ffffffff814c4ca4>] kernel_thread_helper+0x4/0x10 Full dmesg attached.
Created attachment 493676 [details] dmesg for 2.6.39-0.rc3.git2.0.fc16.x86_64 ("1 lock held by scsi_eh_1/196") This time there is: #0: (&host->eh_mutex){+.+...}, at: [<ffffffff8134ef93>] ata_eh_acquire+0x20/0x6e
Created attachment 513491 [details] locking error as showing up in 3.0-0.rc7.git0.1.fc16 This error is clearly inherited also by 3.0-... kernels. This time an error message is somewhat more scary: [ 5.093008] Possible unsafe locking scenario: [ 5.093008] [ 5.093008] CPU0 [ 5.093008] ---- [ 5.093008] lock(&(&host->lock)->rlock); [ 5.093008] <Interrupt> [ 5.093008] lock(&(&host->lock)->rlock); [ 5.093008] [ 5.093008] *** DEADLOCK *** [ 5.093008] [ 5.093008] 1 lock held by udevd/145: A remainder of a problem report is attached.
Me too was just hit by this.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Is this still a problem with 3.9 based F19 kernels?
(In reply to comment #17) > Is this still a problem with 3.9 based F19 kernels? No. I do not remember when I have seen that the last time but it was quite a while.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
Seems to be fixed a while ago.