Description of problem: Kernel crashes on SCSI drive host-swap, on adaptec aic79xx controller. Version-Release number of selected component (if applicable):2.6.16-1.2122_FC5smp How reproducible: Always Steps to Reproduce: 1. unlock SCSI drive 2. take it out 3. push it back -- kernel crashes Actual results: kernel crashes when you try to hot-swap a drive Expected results: kernel should continue to work Additional info: 388:Jun 5 05:19:15 192.168.0.201 end_request: I/O error, dev sdc, sector 143556687 389:Jun 5 05:19:15 192.168.0.201 kernel: sd 0:0:2:0: SCSI error: return code = 0x10000 390:Jun 5 05:19:15 192.168.0.201 raid5: Disk failure on sdc3, disabling device. Operation continuing on 2 devices 391:Jun 5 05:19:15 192.168.0.201 RAID5 conf printout: 392:Jun 5 05:19:15 192.168.0.201 --- rd:3 wd:2 fd:1 393:Jun 5 05:19:15 192.168.0.201 disk 0, o:1, dev:sda3 394:Jun 5 05:19:15 192.168.0.201 disk 1, o:1, dev:sdb3 395:Jun 5 05:19:15 192.168.0.201 disk 2, o:0, dev:sdc3 396:Jun 5 05:19:15 192.168.0.201 RAID5 conf printout: 397:Jun 5 05:19:15 192.168.0.201 --- rd:3 wd:2 fd:1 398:Jun 5 05:19:15 192.168.0.201 disk 0, o:1, dev:sda3 399:Jun 5 05:19:15 192.168.0.201 disk 1, o:1, dev:sdb3 400:Jun 5 05:19:15 192.168.0.201 kernel: end_request: I/O error, dev sdc, sector 143556687 401:Jun 5 05:19:15 192.168.0.201 kernel: raid5: Disk failure on sdc3, disabling device. Operation continuing on 2 devices 402:Jun 5 05:19:15 192.168.0.201 kernel: RAID5 conf printout: 403:Jun 5 05:19:15 192.168.0.201 kernel: --- rd:3 wd:2 fd:1 404:Jun 5 05:19:16 192.168.0.201 kernel: disk 0, o:1, dev:sda3 405:Jun 5 05:19:16 192.168.0.201 kernel: disk 1, o:1, dev:sdb3 406:Jun 5 05:19:16 192.168.0.201 kernel: disk 2, o:0, dev:sdc3 407:Jun 5 05:19:16 192.168.0.201 kernel: RAID5 conf printout: 408:Jun 5 05:19:16 192.168.0.201 kernel: --- rd:3 wd:2 fd:1 409:Jun 5 05:19:16 192.168.0.201 kernel: disk 0, o:1, dev:sda3 410:Jun 5 05:19:16 192.168.0.201 kernel: disk 1, o:1, dev:sdb3 411:Jun 5 05:19:23 192.168.0.201 scsi0: Someone reset channel A 412:Jun 5 05:19:23 192.168.0.201 BUG: spinlock recursion on CPU#2, swapper/0 (Not tainted) 413:Jun 5 05:19:23 192.168.0.201 lock: c268c9c0, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 2 414:Jun 5 05:19:23 192.168.0.201 [<c01d62ed>] spin_bug+0x87/0xe9 [<c01d6482>] _raw_spin_lock+0x32/0xcd 415:Jun 5 05:19:23 192.168.0.201 [<c02f209f>] _spin_lock_irqsave+0x9/0xd [<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx] 416:Jun 5 05:19:23 192.168.0.201 [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx] [<c02f0000>] klist_add_tail+0x9/0x33 417:Jun 5 05:19:23 192.168.0.201 [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx] [<c011e847>] __wake_up+0x2a/0x3d 418:Jun 5 05:19:23 192.168.0.201 [<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx] [<c01455f2>] handle_IRQ_event+0x23/0x4c 419:Jun 5 05:19:23 192.168.0.201 [<c01456a8>] __do_IRQ+0x8d/0xdd [<c0105eee>] do_IRQ+0x60/0x7b 420:Jun 5 05:19:23 192.168.0.201 ======================= 421:Jun 5 05:19:23 192.168.0.201 [<c0104786>] common_interrupt+0x1a/0x20 [<c0102e49>] default_idle+0x0/0x55 422:Jun 5 05:19:23 192.168.0.201 [<c0102e75>] default_idle+0x2c/0x55 [<c0102f2d>] cpu_idle+0x8f/0xa8 423:Jun 5 05:19:23 192.168.0.201 Kernel panic - not syncing: bad locking 424:Jun 5 05:19:23 192.168.0.201 [<c01233de>] panic+0x3e/0x174 [<c01d6310>] spin_bug+0xaa/0xe9 425:Jun 5 05:19:23 192.168.0.201 [<c01d6482>] _raw_spin_lock+0x32/0xcd [<c02f209f>] _spin_lock_irqsave+0x9/0xd 426:Jun 5 05:19:23 192.168.0.201 [<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx] [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx] 427:Jun 5 05:19:23 192.168.0.201 [<c02f0000>] klist_add_tail+0x9/0x33 [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx] 428:Jun 5 05:19:23 192.168.0.201 [<c011e847>] __wake_up+0x2a/0x3d [<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx] 429:Jun 5 05:19:23 192.168.0.201 [<c01455f2>] handle_IRQ_event+0x23/0x4c [<c01456a8>] __do_IRQ+0x8d/0xdd 430:Jun 5 05:19:23 192.168.0.201 [<c0105eee>] do_IRQ+0x60/0x7b ======================= 431:Jun 5 05:19:23 192.168.0.201 [<c0104786>] common_interrupt+0x1a/0x20 432:Jun 5 05:19:23 192.168.0.201 [<c0102e49>] default_idle+0x0/0x55 [<c0102e75>] default_idle+0x2c/0x55 433:Jun 5 05:19:23 192.168.0.201 [<c0102f2d>] cpu_idle+0x8f/0xa8 Badness in smp_call_function at arch/i386/kernel/smp.c:588 (Not tainted) 434:Jun 5 05:19:23 192.168.0.201 [<c0115b17>] stop_this_cpu+0x0/0x2d 435:Jun 5 05:19:23 192.168.0.201 [<c011584c>] smp_call_function+0x5c/0xca [<c0132a98>] __kernel_text_address+0x18/0x23 436:Jun 5 05:19:23 192.168.0.201 [<c01158cd>] smp_send_stop+0x13/0x1c [<c01233f1>] panic+0x51/0x174 437:Jun 5 05:19:23 192.168.0.201 [<c01d6310>] spin_bug+0xaa/0xe9 [<c01d6482>] _raw_spin_lock+0x32/0xcd 438:Jun 5 05:19:23 192.168.0.201 [<c02f209f>] _spin_lock_irqsave+0x9/0xd [<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx] 439:Jun 5 05:19:23 192.168.0.201 [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx] [<c02f0000>] klist_add_tail+0x9/0x33 440:Jun 5 05:19:23 192.168.0.201 [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx] [<c011e847>] __wake_up+0x2a/0x3d 441:Jun 5 05:19:23 192.168.0.201 [<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx] [<c01455f2>] handle_IRQ_event+0x23/0x4c 442:Jun 5 05:19:23 192.168.0.201 [<c01456a8>] __do_IRQ+0x8d/0xdd [<c0105eee>] do_IRQ+0x60/0x7b 443:Jun 5 05:19:23 192.168.0.201 ======================= 444:Jun 5 05:19:23 192.168.0.201 [<c0104786>] common_interrupt+0x1a/0x20 [<c0102e49>] default_idle+0x0/0x55 445:Jun 5 05:19:23 192.168.0.201 [<c0102e75>] default_idle+0x2c/0x55 [<c0102f2d>] cpu_idle+0x8f/0xa8 446:Jun 5 05:19:23 192.168.0.201 Badness in panic at kernel/panic.c:140 (Not tainted) 447:Jun 5 05:19:23 192.168.0.201 [<c0123501>] panic+0x161/0x174 [<c01d6310>] spin_bug+0xaa/0xe9 448:Jun 5 05:19:23 192.168.0.201 [<c01d6482>] _raw_spin_lock+0x32/0xcd [<c02f209f>] _spin_lock_irqsave+0x9/0xd 449:Jun 5 05:19:23 192.168.0.201 [<f88b9ba9>] ahd_freeze_simq+0x12/0x43 [aic79xx] [<f88ae68b>] ahd_reset_channel+0x471/0x4b5 [aic79xx] 450:Jun 5 05:19:23 192.168.0.201 [<c02f0000>] klist_add_tail+0x9/0x33 [<f88af27a>] ahd_handle_scsiint+0x349/0x15c7 [aic79xx] 451:Jun 5 05:19:23 192.168.0.201 [<c011e847>] __wake_up+0x2a/0x3d [<f88b906a>] ahd_linux_isr+0x160/0x17b [aic79xx] 452:Jun 5 05:19:23 192.168.0.201 [<c01455f2>] handle_IRQ_event+0x23/0x4c [<c01456a8>] __do_IRQ+0x8d/0xdd 453:Jun 5 05:19:23 192.168.0.201 [<c0105eee>] do_IRQ+0x60/0x7b ======================= 454:Jun 5 05:19:23 192.168.0.201 [<c0104786>] common_interrupt+0x1a/0x20 455:Jun 5 05:19:23 192.168.0.201 [<c0102e49>] default_idle+0x0/0x55 [<c0102e75>] default_idle+0x2c/0x55
Created attachment 130677 [details] hardware descriptions -- dmesg
Confirm this is fixed in 2.6.17, see http://lkml.org/lkml/2006/6/19/369 Rgds, Konstantin
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
it's been ~ 4 month since the bug was reported and ... and now I cannot use the hardware in question to test 2.6.18-et.al. The server went to production running RHEL4 (where this bug did not occur) Some time back I reported that it was fixed in 2.6.17, so I think this bug can be closed. Thanks for the follow-up, Konstantin