Bug 180638

Summary: System Hang: file i/o to lpfc driver while fsck floppy
Product: Red Hat Enterprise Linux 3 Reporter: James Smart <james.smart>
Component: kernelAssignee: Brian Maly <bmaly>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: coughlan, petrides
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-27 19:14:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James Smart 2006-02-09 16:55:54 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
During our testing of RHEL3U6 kernel, we found
an issue with floppy disk driver in RHEL3U6 kernel. This driver
can go to sleep with io_request_lock held. Since most of the
SCSI HBA drivers (including lpfc 7.x driver) in the RHEL3 kernel 
get the io_request_lock in hardware interrupt context, this can 
cause system dead lock.


System hang while running IO through lpfc driver while running fsck
on floppy:
====================================================================
The floppy disk driver can goto sleep with io_request lock held
in RHEL3U6 kernel.
Following is the stack trace doing this:

fsck          S F07B6000     0  1392   1391                     (NOTLB)
Call Trace:   [<f8be7d2a>] _lock_fdc [floppy] 0xca (0xe3157df8)
[<f8bf3648>] fdc_wait [floppy] 0x4 (0xe3157e30)
[<f8bf3648>] fdc_wait [floppy] 0x4 (0xe3157e34)
[<f8bec30f>] do_fd_request [floppy] 0x4f (0xe3157e58)
[<c01d2673>] generic_unplug_device [kernel] 0x43 (0xe3157e68)
[<c013011a>] __run_task_queue [kernel] 0x6a (0xe3157e78)
[<c016a4ff>] block_sync_page [kernel] 0x1f (0xe3157e90)
[<c0148aae>] ___wait_on_page [kernel] 0xde (0xe3157e98)
[<c01498b0>] do_generic_file_read [kernel] 0x480 (0xe3157ef0)
[<c014a1b5>] generic_file_new_read [kernel] 0xc5 (0xe3157f30)
[<c0149ff0>] file_read_actor [kernel] 0x0 (0xe3157f40)
[<c0163950>] dentry_open [kernel] 0x110 (0xe3157f4c)
[<c014a2df>] generic_file_read [kernel] 0x2f (0xe3157f7c)
[<c01649e7>] sys_read [kernel] 0x97 (0xe3157f94)

void generic_unplug_device(void *data)
{
        request_queue_t *q = (request_queue_t *) data;
        unsigned long flags;

        spin_lock_irqsave(q->queue_lock, flags);
        __generic_unplug_device(q);
        spin_unlock_irqrestore(q->queue_lock, flags);
}
=================

For a floppy disk q->queue_lock is initialized as follows:
void blk_init_queue(request_queue_t * q, request_fn_proc * rfn) {
....
block/ll_rw_blk.c:544:  q->queue_lock           = &io_request_lock;
...
}

If fsck sleep in the above code path, the lpfc driver can cause following
NMI watchdog panic while lpfc driver try to acquire io_request_lock 
from the interrupt context.

=========
NMI Watchdog detected LOCKUP on CPU1, eip f8a3ad41, registers:
netconsole ide-cd loop st sr_mod cdrom cpqci audit usbserial lp parport 8021q autofs4 nfs lockd su
nrpc bcm5700 floppy sg microcode lpfcdfc keybdev mousedev hi
CPU:    1
EIP:    0060:[<f8a3ad41>]    Tainted: P
EFLAGS: 00000082

EIP is at lpfc_scsi_done [lpfc] 0x261 (2.4.21-37.ELsmp/i686)
eax: f67cba00   ebx: f6df5200   ecx: f78ec000   edx: 00000202
esi: f78ec000   edi: 00000202   ebp: f0c21bf4   esp: f0c21bec
ds: 0068   es: 0068   ss: 0068
Process diskfs (pid: 13126, stackpage=f0c21000)
Stack: f78ec000 f0c21bf4 f6dd8000 f8a54160 00000000 f67cba80 f6df5200 f8a39f26
       f78ec000 f6df5200 00000000 c0013600 f6dd80d0 f6f6a818 0000007a 00000000
       00000001 0000007a 00000000 f6dd8000 0000007a 0000007a f8a01e0b f6dd8000
Call Trace:   [<f8a54160>] lpfc_iostat_tbl [lpfc] 0x0 (0xf0c21bf8)
[<f8a39f26>] lpfc_os_return_scsi_cmd [lpfc] 0x76 (0xf0c21c08)
[<f8a01e0b>] rw_intr [sd_mod] 0x7b (0xf0c21c44)
[<f8a3af85>] lpfc_sched_sli_done [lpfc] 0x85 (0xf0c21c74)
[<f8a1b180>] lpfc_sli_process_sol_iocb [lpfc] 0x70 (0xf0c21cb0)
[<f8a1b002>] lpfc_sli_handle_ring_event [lpfc] 0x562 (0xf0c21cec)
[<c0124140>] schedule [kernel] 0x340 (0xf0c21d14)
[<f8a1a501>] lpfc_sli_intr [lpfc] 0xe1 (0xf0c21d74)
[<f8a201b8>] lpfc_intr_handler [lpfc] 0x48 (0xf0c21d98)
[<c010dd49>] handle_IRQ_event [kernel] 0x69 (0xf0c21db0)
[<c010df89>] do_IRQ [kernel] 0xb9 (0xf0c21dd0)
[<c010ded0>] do_IRQ [kernel] 0x0 (0xf0c21df4)
[<c01d3835>] submit_bh_rsector [kernel] 0x35 (0xf0c21e24)
[<c0167c66>] create_empty_buffers [kernel] 0x26 (0xf0c21e30)
[<c0168616>] block_read_full_page [kernel] 0x266 (0xf0c21e48)
[<c0153cfa>] lru_cache_add [kernel] 0x28a (0xf0c21e80)
[<c01486aa>] add_to_page_cache_unique [kernel] 0x5a (0xf0c21e98)
[<c0148901>] page_cache_read [kernel] 0xe1 (0xf0c21eac)
[<c019fe00>] ext2_get_block [kernel] 0x0 (0xf0c21eb4)
[<c0149327>] generic_file_readahead [kernel] 0xd7 (0xf0c21ed4)
[<c014980d>] do_generic_file_read [kernel] 0x3dd (0xf0c21ef0)
[<c014a1b5>] generic_file_new_read [kernel] 0xc5 (0xf0c21f30)
[<c0149ff0>] file_read_actor [kernel] 0x0 (0xf0c21f40)
[<c0163950>] dentry_open [kernel] 0x110 (0xf0c21f4c)
[<c014a2df>] generic_file_read [kernel] 0x2f (0xf0c21f7c)
[<c01649e7>] sys_read [kernel] 0x97 (0xf0c21f94)

Code: f3 90 7e f5 e9 c4 fd ff ff 90 90 90 90 90 90 83 ec 18 89 5c
============

The test ran clean after removing the floppy module from the system.

This issue exists only in RHEL3. SLES8 works correctly and code
inspection indicates that this issue does not exist in the
upstream 2.4 code.


Version-Release number of selected component (if applicable):
kernel in RHEL3U6

How reproducible:
Always

Steps to Reproduce:
1. start i/o on lpfc-attached storage
2. perform fsck on a floppy disk
3.
  

Additional info: