Bug 194349

Summary: BUG: spinlock recursion on CPU#2, swapper/0 (Not tainted)
Product: [Fedora] Fedora Reporter: Konstantin Antselovich <konstantin>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-16 23:40:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hardware descriptions -- dmesg none

Description Konstantin Antselovich 2006-06-07 14:05:06 UTC
Description of problem: Kernel crashes on SCSI drive host-swap, on adaptec
aic79xx controller.  


Version-Release number of selected component (if applicable):2.6.16-1.2122_FC5smp


How reproducible: Always


Steps to Reproduce:
1. unlock SCSI drive
2. take it out
3. push it back -- kernel crashes
  
Actual results:

kernel crashes when you try to hot-swap a drive

Expected results:

kernel should continue to work
Additional info:
388:Jun  5 05:19:15 192.168.0.201  end_request: I/O error, dev sdc, sector 143556687
389:Jun  5 05:19:15 192.168.0.201 kernel: sd 0:0:2:0: SCSI error: return code =
0x10000
390:Jun  5 05:19:15 192.168.0.201  raid5: Disk failure on sdc3, disabling
device. Operation continuing on 2 devices
391:Jun  5 05:19:15 192.168.0.201  RAID5 conf printout:
392:Jun  5 05:19:15 192.168.0.201   --- rd:3 wd:2 fd:1
393:Jun  5 05:19:15 192.168.0.201   disk 0, o:1, dev:sda3
394:Jun  5 05:19:15 192.168.0.201   disk 1, o:1, dev:sdb3
395:Jun  5 05:19:15 192.168.0.201   disk 2, o:0, dev:sdc3
396:Jun  5 05:19:15 192.168.0.201  RAID5 conf printout:
397:Jun  5 05:19:15 192.168.0.201   --- rd:3 wd:2 fd:1
398:Jun  5 05:19:15 192.168.0.201   disk 0, o:1, dev:sda3
399:Jun  5 05:19:15 192.168.0.201   disk 1, o:1, dev:sdb3
400:Jun  5 05:19:15 192.168.0.201 kernel: end_request: I/O error, dev sdc,
sector 143556687
401:Jun  5 05:19:15 192.168.0.201 kernel: raid5: Disk failure on sdc3, disabling
device. Operation continuing on 2 devices
402:Jun  5 05:19:15 192.168.0.201 kernel: RAID5 conf printout:
403:Jun  5 05:19:15 192.168.0.201 kernel:  --- rd:3 wd:2 fd:1
404:Jun  5 05:19:16 192.168.0.201 kernel:  disk 0, o:1, dev:sda3
405:Jun  5 05:19:16 192.168.0.201 kernel:  disk 1, o:1, dev:sdb3
406:Jun  5 05:19:16 192.168.0.201 kernel:  disk 2, o:0, dev:sdc3
407:Jun  5 05:19:16 192.168.0.201 kernel: RAID5 conf printout:
408:Jun  5 05:19:16 192.168.0.201 kernel:  --- rd:3 wd:2 fd:1
409:Jun  5 05:19:16 192.168.0.201 kernel:  disk 0, o:1, dev:sda3
410:Jun  5 05:19:16 192.168.0.201 kernel:  disk 1, o:1, dev:sdb3
411:Jun  5 05:19:23 192.168.0.201  scsi0: Someone reset channel A
412:Jun  5 05:19:23 192.168.0.201  BUG: spinlock recursion on CPU#2, swapper/0
(Not tainted)
413:Jun  5 05:19:23 192.168.0.201   lock: c268c9c0, .magic: dead4ead, .owner:
swapper/0, .owner_cpu: 2
414:Jun  5 05:19:23 192.168.0.201   [<c01d62ed>] spin_bug+0x87/0xe9    
[<c01d6482>] _raw_spin_lock+0x32/0xcd
415:Jun  5 05:19:23 192.168.0.201   [<c02f209f>] _spin_lock_irqsave+0x9/0xd    
[<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx]
416:Jun  5 05:19:23 192.168.0.201   [<f88ae68b>] ahd_reset_channel+0x471/0x4b5
[aic79xx]     [<c02f0000>] klist_add_tail+0x9/0x33
417:Jun  5 05:19:23 192.168.0.201   [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7
[aic79xx]     [<c011e847>] __wake_up+0x2a/0x3d
418:Jun  5 05:19:23 192.168.0.201   [<f88b906a>] ahd_linux_isr+0x160/0x17b
[aic79xx]     [<c01455f2>] handle_IRQ_event+0x23/0x4c
419:Jun  5 05:19:23 192.168.0.201   [<c01456a8>] __do_IRQ+0x8d/0xdd    
[<c0105eee>] do_IRQ+0x60/0x7b
420:Jun  5 05:19:23 192.168.0.201   =======================
421:Jun  5 05:19:23 192.168.0.201   [<c0104786>] common_interrupt+0x1a/0x20    
[<c0102e49>] default_idle+0x0/0x55
422:Jun  5 05:19:23 192.168.0.201   [<c0102e75>] default_idle+0x2c/0x55    
[<c0102f2d>] cpu_idle+0x8f/0xa8
423:Jun  5 05:19:23 192.168.0.201  Kernel panic - not syncing: bad locking
424:Jun  5 05:19:23 192.168.0.201   [<c01233de>] panic+0x3e/0x174    
[<c01d6310>] spin_bug+0xaa/0xe9
425:Jun  5 05:19:23 192.168.0.201   [<c01d6482>] _raw_spin_lock+0x32/0xcd    
[<c02f209f>] _spin_lock_irqsave+0x9/0xd
426:Jun  5 05:19:23 192.168.0.201   [<f88b9ba9>] ahd_freeze_simq+0x12/0x43
[aic79xx]     [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx]
427:Jun  5 05:19:23 192.168.0.201   [<c02f0000>] klist_add_tail+0x9/0x33    
[<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx]
428:Jun  5 05:19:23 192.168.0.201   [<c011e847>] __wake_up+0x2a/0x3d    
[<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx]
429:Jun  5 05:19:23 192.168.0.201   [<c01455f2>] handle_IRQ_event+0x23/0x4c    
[<c01456a8>] __do_IRQ+0x8d/0xdd
430:Jun  5 05:19:23 192.168.0.201   [<c0105eee>] do_IRQ+0x60/0x7b    
=======================
431:Jun  5 05:19:23 192.168.0.201   [<c0104786>] common_interrupt+0x1a/0x20
432:Jun  5 05:19:23 192.168.0.201   [<c0102e49>] default_idle+0x0/0x55    
[<c0102e75>] default_idle+0x2c/0x55
433:Jun  5 05:19:23 192.168.0.201   [<c0102f2d>] cpu_idle+0x8f/0xa8     Badness
in smp_call_function at arch/i386/kernel/smp.c:588 (Not tainted)
434:Jun  5 05:19:23 192.168.0.201   [<c0115b17>] stop_this_cpu+0x0/0x2d
435:Jun  5 05:19:23 192.168.0.201   [<c011584c>] smp_call_function+0x5c/0xca   
 [<c0132a98>] __kernel_text_address+0x18/0x23
436:Jun  5 05:19:23 192.168.0.201   [<c01158cd>] smp_send_stop+0x13/0x1c    
[<c01233f1>] panic+0x51/0x174
437:Jun  5 05:19:23 192.168.0.201   [<c01d6310>] spin_bug+0xaa/0xe9    
[<c01d6482>] _raw_spin_lock+0x32/0xcd
438:Jun  5 05:19:23 192.168.0.201   [<c02f209f>] _spin_lock_irqsave+0x9/0xd    
[<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx]
439:Jun  5 05:19:23 192.168.0.201   [<f88ae68b>] ahd_reset_channel+0x471/0x4b5
[aic79xx]     [<c02f0000>] klist_add_tail+0x9/0x33
440:Jun  5 05:19:23 192.168.0.201   [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7
[aic79xx]     [<c011e847>] __wake_up+0x2a/0x3d
441:Jun  5 05:19:23 192.168.0.201   [<f88b906a>] ahd_linux_isr+0x160/0x17b
[aic79xx]     [<c01455f2>] handle_IRQ_event+0x23/0x4c
442:Jun  5 05:19:23 192.168.0.201   [<c01456a8>] __do_IRQ+0x8d/0xdd    
[<c0105eee>] do_IRQ+0x60/0x7b
443:Jun  5 05:19:23 192.168.0.201   =======================
444:Jun  5 05:19:23 192.168.0.201   [<c0104786>] common_interrupt+0x1a/0x20    
[<c0102e49>] default_idle+0x0/0x55
445:Jun  5 05:19:23 192.168.0.201   [<c0102e75>] default_idle+0x2c/0x55    
[<c0102f2d>] cpu_idle+0x8f/0xa8
446:Jun  5 05:19:23 192.168.0.201  Badness in panic at kernel/panic.c:140 (Not
tainted)
447:Jun  5 05:19:23 192.168.0.201   [<c0123501>] panic+0x161/0x174    
[<c01d6310>] spin_bug+0xaa/0xe9
448:Jun  5 05:19:23 192.168.0.201   [<c01d6482>] _raw_spin_lock+0x32/0xcd    
[<c02f209f>] _spin_lock_irqsave+0x9/0xd
449:Jun  5 05:19:23 192.168.0.201   [<f88b9ba9>] ahd_freeze_simq+0x12/0x43
[aic79xx]     [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx]
450:Jun  5 05:19:23 192.168.0.201   [<c02f0000>] klist_add_tail+0x9/0x33    
[<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx]
451:Jun  5 05:19:23 192.168.0.201   [<c011e847>] __wake_up+0x2a/0x3d    
[<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx]
452:Jun  5 05:19:23 192.168.0.201   [<c01455f2>] handle_IRQ_event+0x23/0x4c    
[<c01456a8>] __do_IRQ+0x8d/0xdd
453:Jun  5 05:19:23 192.168.0.201   [<c0105eee>] do_IRQ+0x60/0x7b    
=======================
454:Jun  5 05:19:23 192.168.0.201   [<c0104786>] common_interrupt+0x1a/0x20
455:Jun  5 05:19:23 192.168.0.201   [<c0102e49>] default_idle+0x0/0x55    
[<c0102e75>] default_idle+0x2c/0x55

Comment 1 Konstantin Antselovich 2006-06-07 14:05:06 UTC
Created attachment 130677 [details]
hardware descriptions -- dmesg

Comment 2 Konstantin Antselovich 2006-06-20 06:35:11 UTC
Confirm this is fixed in 2.6.17, see http://lkml.org/lkml/2006/6/19/369

Rgds,
Konstantin

Comment 3 Dave Jones 2006-10-16 18:58:46 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 4 Konstantin Antselovich 2006-10-16 23:33:50 UTC
it's been ~ 4 month since the bug was reported and ... and  now I cannot use the
 hardware in question to test 2.6.18-et.al.  The server went to production
running RHEL4 (where this bug did not occur)

Some time back I reported that it was fixed in 2.6.17, so I think this bug can
be closed. 

Thanks for the follow-up,
Konstantin