From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7 Description of problem: During our testing of RHEL3U6 kernel, we found an issue with floppy disk driver in RHEL3U6 kernel. This driver can go to sleep with io_request_lock held. Since most of the SCSI HBA drivers (including lpfc 7.x driver) in the RHEL3 kernel get the io_request_lock in hardware interrupt context, this can cause system dead lock. System hang while running IO through lpfc driver while running fsck on floppy: ==================================================================== The floppy disk driver can goto sleep with io_request lock held in RHEL3U6 kernel. Following is the stack trace doing this: fsck S F07B6000 0 1392 1391 (NOTLB) Call Trace: [<f8be7d2a>] _lock_fdc [floppy] 0xca (0xe3157df8) [<f8bf3648>] fdc_wait [floppy] 0x4 (0xe3157e30) [<f8bf3648>] fdc_wait [floppy] 0x4 (0xe3157e34) [<f8bec30f>] do_fd_request [floppy] 0x4f (0xe3157e58) [<c01d2673>] generic_unplug_device [kernel] 0x43 (0xe3157e68) [<c013011a>] __run_task_queue [kernel] 0x6a (0xe3157e78) [<c016a4ff>] block_sync_page [kernel] 0x1f (0xe3157e90) [<c0148aae>] ___wait_on_page [kernel] 0xde (0xe3157e98) [<c01498b0>] do_generic_file_read [kernel] 0x480 (0xe3157ef0) [<c014a1b5>] generic_file_new_read [kernel] 0xc5 (0xe3157f30) [<c0149ff0>] file_read_actor [kernel] 0x0 (0xe3157f40) [<c0163950>] dentry_open [kernel] 0x110 (0xe3157f4c) [<c014a2df>] generic_file_read [kernel] 0x2f (0xe3157f7c) [<c01649e7>] sys_read [kernel] 0x97 (0xe3157f94) void generic_unplug_device(void *data) { request_queue_t *q = (request_queue_t *) data; unsigned long flags; spin_lock_irqsave(q->queue_lock, flags); __generic_unplug_device(q); spin_unlock_irqrestore(q->queue_lock, flags); } ================= For a floppy disk q->queue_lock is initialized as follows: void blk_init_queue(request_queue_t * q, request_fn_proc * rfn) { .... block/ll_rw_blk.c:544: q->queue_lock = &io_request_lock; ... } If fsck sleep in the above code path, the lpfc driver can cause following NMI watchdog panic while lpfc driver try to acquire io_request_lock from the interrupt context. ========= NMI Watchdog detected LOCKUP on CPU1, eip f8a3ad41, registers: netconsole ide-cd loop st sr_mod cdrom cpqci audit usbserial lp parport 8021q autofs4 nfs lockd su nrpc bcm5700 floppy sg microcode lpfcdfc keybdev mousedev hi CPU: 1 EIP: 0060:[<f8a3ad41>] Tainted: P EFLAGS: 00000082 EIP is at lpfc_scsi_done [lpfc] 0x261 (2.4.21-37.ELsmp/i686) eax: f67cba00 ebx: f6df5200 ecx: f78ec000 edx: 00000202 esi: f78ec000 edi: 00000202 ebp: f0c21bf4 esp: f0c21bec ds: 0068 es: 0068 ss: 0068 Process diskfs (pid: 13126, stackpage=f0c21000) Stack: f78ec000 f0c21bf4 f6dd8000 f8a54160 00000000 f67cba80 f6df5200 f8a39f26 f78ec000 f6df5200 00000000 c0013600 f6dd80d0 f6f6a818 0000007a 00000000 00000001 0000007a 00000000 f6dd8000 0000007a 0000007a f8a01e0b f6dd8000 Call Trace: [<f8a54160>] lpfc_iostat_tbl [lpfc] 0x0 (0xf0c21bf8) [<f8a39f26>] lpfc_os_return_scsi_cmd [lpfc] 0x76 (0xf0c21c08) [<f8a01e0b>] rw_intr [sd_mod] 0x7b (0xf0c21c44) [<f8a3af85>] lpfc_sched_sli_done [lpfc] 0x85 (0xf0c21c74) [<f8a1b180>] lpfc_sli_process_sol_iocb [lpfc] 0x70 (0xf0c21cb0) [<f8a1b002>] lpfc_sli_handle_ring_event [lpfc] 0x562 (0xf0c21cec) [<c0124140>] schedule [kernel] 0x340 (0xf0c21d14) [<f8a1a501>] lpfc_sli_intr [lpfc] 0xe1 (0xf0c21d74) [<f8a201b8>] lpfc_intr_handler [lpfc] 0x48 (0xf0c21d98) [<c010dd49>] handle_IRQ_event [kernel] 0x69 (0xf0c21db0) [<c010df89>] do_IRQ [kernel] 0xb9 (0xf0c21dd0) [<c010ded0>] do_IRQ [kernel] 0x0 (0xf0c21df4) [<c01d3835>] submit_bh_rsector [kernel] 0x35 (0xf0c21e24) [<c0167c66>] create_empty_buffers [kernel] 0x26 (0xf0c21e30) [<c0168616>] block_read_full_page [kernel] 0x266 (0xf0c21e48) [<c0153cfa>] lru_cache_add [kernel] 0x28a (0xf0c21e80) [<c01486aa>] add_to_page_cache_unique [kernel] 0x5a (0xf0c21e98) [<c0148901>] page_cache_read [kernel] 0xe1 (0xf0c21eac) [<c019fe00>] ext2_get_block [kernel] 0x0 (0xf0c21eb4) [<c0149327>] generic_file_readahead [kernel] 0xd7 (0xf0c21ed4) [<c014980d>] do_generic_file_read [kernel] 0x3dd (0xf0c21ef0) [<c014a1b5>] generic_file_new_read [kernel] 0xc5 (0xf0c21f30) [<c0149ff0>] file_read_actor [kernel] 0x0 (0xf0c21f40) [<c0163950>] dentry_open [kernel] 0x110 (0xf0c21f4c) [<c014a2df>] generic_file_read [kernel] 0x2f (0xf0c21f7c) [<c01649e7>] sys_read [kernel] 0x97 (0xf0c21f94) Code: f3 90 7e f5 e9 c4 fd ff ff 90 90 90 90 90 90 83 ec 18 89 5c ============ The test ran clean after removing the floppy module from the system. This issue exists only in RHEL3. SLES8 works correctly and code inspection indicates that this issue does not exist in the upstream 2.4 code. Version-Release number of selected component (if applicable): kernel in RHEL3U6 How reproducible: Always Steps to Reproduce: 1. start i/o on lpfc-attached storage 2. perform fsck on a floppy disk 3. Additional info: