Bug 194349 - BUG: spinlock recursion on CPU#2, swapper/0 (Not tainted)
Summary: BUG: spinlock recursion on CPU#2, swapper/0 (Not tainted)
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
(Show other bugs)
Version: 5
Hardware: All Linux
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2006-06-07 14:05 UTC by Konstantin Antselovich
Modified: 2015-01-04 22:27 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-10-16 23:40:39 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
hardware descriptions -- dmesg (101.95 KB, text/plain)
2006-06-07 14:05 UTC, Konstantin Antselovich
no flags Details

Description Konstantin Antselovich 2006-06-07 14:05:06 UTC
Description of problem: Kernel crashes on SCSI drive host-swap, on adaptec
aic79xx controller.  

Version-Release number of selected component (if applicable):2.6.16-1.2122_FC5smp

How reproducible: Always

Steps to Reproduce:
1. unlock SCSI drive
2. take it out
3. push it back -- kernel crashes
Actual results:

kernel crashes when you try to hot-swap a drive

Expected results:

kernel should continue to work
Additional info:
388:Jun  5 05:19:15  end_request: I/O error, dev sdc, sector 143556687
389:Jun  5 05:19:15 kernel: sd 0:0:2:0: SCSI error: return code =
390:Jun  5 05:19:15  raid5: Disk failure on sdc3, disabling
device. Operation continuing on 2 devices
391:Jun  5 05:19:15  RAID5 conf printout:
392:Jun  5 05:19:15   --- rd:3 wd:2 fd:1
393:Jun  5 05:19:15   disk 0, o:1, dev:sda3
394:Jun  5 05:19:15   disk 1, o:1, dev:sdb3
395:Jun  5 05:19:15   disk 2, o:0, dev:sdc3
396:Jun  5 05:19:15  RAID5 conf printout:
397:Jun  5 05:19:15   --- rd:3 wd:2 fd:1
398:Jun  5 05:19:15   disk 0, o:1, dev:sda3
399:Jun  5 05:19:15   disk 1, o:1, dev:sdb3
400:Jun  5 05:19:15 kernel: end_request: I/O error, dev sdc,
sector 143556687
401:Jun  5 05:19:15 kernel: raid5: Disk failure on sdc3, disabling
device. Operation continuing on 2 devices
402:Jun  5 05:19:15 kernel: RAID5 conf printout:
403:Jun  5 05:19:15 kernel:  --- rd:3 wd:2 fd:1
404:Jun  5 05:19:16 kernel:  disk 0, o:1, dev:sda3
405:Jun  5 05:19:16 kernel:  disk 1, o:1, dev:sdb3
406:Jun  5 05:19:16 kernel:  disk 2, o:0, dev:sdc3
407:Jun  5 05:19:16 kernel: RAID5 conf printout:
408:Jun  5 05:19:16 kernel:  --- rd:3 wd:2 fd:1
409:Jun  5 05:19:16 kernel:  disk 0, o:1, dev:sda3
410:Jun  5 05:19:16 kernel:  disk 1, o:1, dev:sdb3
411:Jun  5 05:19:23  scsi0: Someone reset channel A
412:Jun  5 05:19:23  BUG: spinlock recursion on CPU#2, swapper/0
(Not tainted)
413:Jun  5 05:19:23   lock: c268c9c0, .magic: dead4ead, .owner:
swapper/0, .owner_cpu: 2
414:Jun  5 05:19:23   [<c01d62ed>] spin_bug+0x87/0xe9    
[<c01d6482>] _raw_spin_lock+0x32/0xcd
415:Jun  5 05:19:23   [<c02f209f>] _spin_lock_irqsave+0x9/0xd    
[<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx]
416:Jun  5 05:19:23   [<f88ae68b>] ahd_reset_channel+0x471/0x4b5
[aic79xx]     [<c02f0000>] klist_add_tail+0x9/0x33
417:Jun  5 05:19:23   [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7
[aic79xx]     [<c011e847>] __wake_up+0x2a/0x3d
418:Jun  5 05:19:23   [<f88b906a>] ahd_linux_isr+0x160/0x17b
[aic79xx]     [<c01455f2>] handle_IRQ_event+0x23/0x4c
419:Jun  5 05:19:23   [<c01456a8>] __do_IRQ+0x8d/0xdd    
[<c0105eee>] do_IRQ+0x60/0x7b
420:Jun  5 05:19:23   =======================
421:Jun  5 05:19:23   [<c0104786>] common_interrupt+0x1a/0x20    
[<c0102e49>] default_idle+0x0/0x55
422:Jun  5 05:19:23   [<c0102e75>] default_idle+0x2c/0x55    
[<c0102f2d>] cpu_idle+0x8f/0xa8
423:Jun  5 05:19:23  Kernel panic - not syncing: bad locking
424:Jun  5 05:19:23   [<c01233de>] panic+0x3e/0x174    
[<c01d6310>] spin_bug+0xaa/0xe9
425:Jun  5 05:19:23   [<c01d6482>] _raw_spin_lock+0x32/0xcd    
[<c02f209f>] _spin_lock_irqsave+0x9/0xd
426:Jun  5 05:19:23   [<f88b9ba9>] ahd_freeze_simq+0x12/0x43
[aic79xx]     [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx]
427:Jun  5 05:19:23   [<c02f0000>] klist_add_tail+0x9/0x33    
[<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx]
428:Jun  5 05:19:23   [<c011e847>] __wake_up+0x2a/0x3d    
[<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx]
429:Jun  5 05:19:23   [<c01455f2>] handle_IRQ_event+0x23/0x4c    
[<c01456a8>] __do_IRQ+0x8d/0xdd
430:Jun  5 05:19:23   [<c0105eee>] do_IRQ+0x60/0x7b    
431:Jun  5 05:19:23   [<c0104786>] common_interrupt+0x1a/0x20
432:Jun  5 05:19:23   [<c0102e49>] default_idle+0x0/0x55    
[<c0102e75>] default_idle+0x2c/0x55
433:Jun  5 05:19:23   [<c0102f2d>] cpu_idle+0x8f/0xa8     Badness
in smp_call_function at arch/i386/kernel/smp.c:588 (Not tainted)
434:Jun  5 05:19:23   [<c0115b17>] stop_this_cpu+0x0/0x2d
435:Jun  5 05:19:23   [<c011584c>] smp_call_function+0x5c/0xca   
 [<c0132a98>] __kernel_text_address+0x18/0x23
436:Jun  5 05:19:23   [<c01158cd>] smp_send_stop+0x13/0x1c    
[<c01233f1>] panic+0x51/0x174
437:Jun  5 05:19:23   [<c01d6310>] spin_bug+0xaa/0xe9    
[<c01d6482>] _raw_spin_lock+0x32/0xcd
438:Jun  5 05:19:23   [<c02f209f>] _spin_lock_irqsave+0x9/0xd    
[<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx]
439:Jun  5 05:19:23   [<f88ae68b>] ahd_reset_channel+0x471/0x4b5
[aic79xx]     [<c02f0000>] klist_add_tail+0x9/0x33
440:Jun  5 05:19:23   [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7
[aic79xx]     [<c011e847>] __wake_up+0x2a/0x3d
441:Jun  5 05:19:23   [<f88b906a>] ahd_linux_isr+0x160/0x17b
[aic79xx]     [<c01455f2>] handle_IRQ_event+0x23/0x4c
442:Jun  5 05:19:23   [<c01456a8>] __do_IRQ+0x8d/0xdd    
[<c0105eee>] do_IRQ+0x60/0x7b
443:Jun  5 05:19:23   =======================
444:Jun  5 05:19:23   [<c0104786>] common_interrupt+0x1a/0x20    
[<c0102e49>] default_idle+0x0/0x55
445:Jun  5 05:19:23   [<c0102e75>] default_idle+0x2c/0x55    
[<c0102f2d>] cpu_idle+0x8f/0xa8
446:Jun  5 05:19:23  Badness in panic at kernel/panic.c:140 (Not
447:Jun  5 05:19:23   [<c0123501>] panic+0x161/0x174    
[<c01d6310>] spin_bug+0xaa/0xe9
448:Jun  5 05:19:23   [<c01d6482>] _raw_spin_lock+0x32/0xcd    
[<c02f209f>] _spin_lock_irqsave+0x9/0xd
449:Jun  5 05:19:23   [<f88b9ba9>] ahd_freeze_simq+0x12/0x43
[aic79xx]     [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx]
450:Jun  5 05:19:23   [<c02f0000>] klist_add_tail+0x9/0x33    
[<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx]
451:Jun  5 05:19:23   [<c011e847>] __wake_up+0x2a/0x3d    
[<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx]
452:Jun  5 05:19:23   [<c01455f2>] handle_IRQ_event+0x23/0x4c    
[<c01456a8>] __do_IRQ+0x8d/0xdd
453:Jun  5 05:19:23   [<c0105eee>] do_IRQ+0x60/0x7b    
454:Jun  5 05:19:23   [<c0104786>] common_interrupt+0x1a/0x20
455:Jun  5 05:19:23   [<c0102e49>] default_idle+0x0/0x55    
[<c0102e75>] default_idle+0x2c/0x55

Comment 1 Konstantin Antselovich 2006-06-07 14:05:06 UTC
Created attachment 130677 [details]
hardware descriptions -- dmesg

Comment 2 Konstantin Antselovich 2006-06-20 06:35:11 UTC
Confirm this is fixed in 2.6.17, see http://lkml.org/lkml/2006/6/19/369


Comment 3 Dave Jones 2006-10-16 18:58:46 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 4 Konstantin Antselovich 2006-10-16 23:33:50 UTC
it's been ~ 4 month since the bug was reported and ... and  now I cannot use the
 hardware in question to test 2.6.18-et.al.  The server went to production
running RHEL4 (where this bug did not occur)

Some time back I reported that it was fixed in 2.6.17, so I think this bug can
be closed. 

Thanks for the follow-up,

Note You need to log in before you can comment on or make changes to this bug.