Bug 235695

Summary:	System panic if a Fibre Channel disk disappear while SCSI in the middle of lun scan
Product:	Red Hat Enterprise Linux 5	Reporter:	Bino J Sebastian <bino.sebastian>
Component:	kernel	Assignee:	David Milburn <dmilburn>
Status:	CLOSED DUPLICATE	QA Contact:	Martin Jenner <mjenner>
Severity:	high	Docs Contact:
Priority:	medium
Version:	5.0	CC:	bino.sebastian, dzickus, laurie.barry, petrides
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-09-11 22:13:58 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Bino J Sebastian 2007-04-09 17:38:38 UTC

Description of problem:
System panics with following message in the console if a Fibre Channel
disk disappear in the middle of lun scan.

rport-6:0-2: blocked FC remote port time out: saving binding
lpfc 0000:07:00.0: 0:0203 Devloss timeout on WWPN 50:0:1f:e1:50:6:54:88 NPort 
x610513 Data: x8 x7 x4
Unable to handle kernel NULL pointer dereference at 0000000000000060 RIP:
 [<ffffffff80061625>] mutex_lock+0x10/0x1d
PGD 11423f067 PUD 11415a067 PMD 0
Oops: 0002 [1] SMP
last sysfs file: /class/scsi_host/host6/scan
CPU 2
Modules linked in: lpfc(U) nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap 
bluetooth sunrpc ipv6 dm_mirror dmd
Pid: 4451, comm: bash Not tainted 2.6.18-8.el5 #1
RIP: 0010:[<ffffffff80061625>]  [<ffffffff80061625>] mutex_lock+0x10/0x1d
RSP: 0018:ffff810115ed9dd8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000060 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 000000002f9806d8 RDI: 0000000000000060
RBP: 0000000000000060 R08: 0000000000000001 R09: 000000000000003c
R10: 0000000000000000 R11: ffffffff8807c088 R12: ffff81012f9806f8
R13: 0000000000000001 R14: 00000000ffffffff R15: 0000000000000000
FS:  00002aaaaaabbdb0(0000) GS:ffff81012fcd7e40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000118629000 CR4: 00000000000006e0
Process bash (pid: 4451, threadinfo ffff810115ed8000, task ffff81012f9ea080)
Stack:  0000000000000001 00000000ffffffff 0000000000000000 ffffffff8807bea2
 2f9806d8ffffffd8 ffff81012f980698 00000000ffffffff 00000000ffffffff
 ffff81010deea000 00000000ffffffff ffff81012592fd80 ffffffff880f6f75
Call Trace:
 [<ffffffff8807bea2>] :scsi_mod:scsi_scan_target+0x4e/0x83
 [<ffffffff880f6f75>] :scsi_transport_fc:fc_user_scan+0x55/0x85
 [<ffffffff8807c808>] :scsi_mod:store_scan+0x9b/0xc5
 [<ffffffff800fa3a4>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016121>] vfs_write+0xce/0x174
 [<ffffffff800169b2>] sys_write+0x45/0x6e
 [<ffffffff8005b2c1>] tracesys+0xd1/0xdc


Code: f0 ff 0f 0f 88 8d 01 00 00 59 5e 5b c3 41 54 55 48 89 fd 53
RIP  [<ffffffff80061625>] mutex_lock+0x10/0x1d
 RSP <ffff810115ed9dd8>
CR2: 0000000000000060
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
RHEL5 GA
2.6.18-8.el5

How reproducible:
100% reproducible.

Steps to Reproduce:
1. Connect a Emulex lpfc HBA to a SAN with atleast one storage array
   visible to the HBA and atleast one lun presented to the HBA.
2. Make sure that SCSI midlayer can see the SCSI lun using
   "cat /proc/scsi/scsi" command.
3. Unplug the Fibre Channel cable connected to the HBA.
4. Run following command immediately after unplugging the cable
   "echo '- - -' > /sys/class/scsi_host/host<host_no>/scan"
   Where <host_no> is the SCSI host number assigned to the lpfc HBA.
5. The lun_scan will wait until devloss timer expire.
6. Wait atleast 30 seconds for dev_loss timer to expire.  

Actual results:
The system panicked with following stacl trace:
rport-6:0-2: blocked FC remote port time out: saving binding
lpfc 0000:07:00.0: 0:0203 Devloss timeout on WWPN 50:0:1f:e1:50:6:54:88 NPort 
x610513 Data: x8 x7 x4
Unable to handle kernel NULL pointer dereference at 0000000000000060 RIP:
 [<ffffffff80061625>] mutex_lock+0x10/0x1d
PGD 11423f067 PUD 11415a067 PMD 0
Oops: 0002 [1] SMP
last sysfs file: /class/scsi_host/host6/scan
CPU 2
Modules linked in: lpfc(U) nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap 
bluetooth sunrpc ipv6 dm_mirror dmd
Pid: 4451, comm: bash Not tainted 2.6.18-8.el5 #1
RIP: 0010:[<ffffffff80061625>]  [<ffffffff80061625>] mutex_lock+0x10/0x1d
RSP: 0018:ffff810115ed9dd8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000060 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 000000002f9806d8 RDI: 0000000000000060
RBP: 0000000000000060 R08: 0000000000000001 R09: 000000000000003c
R10: 0000000000000000 R11: ffffffff8807c088 R12: ffff81012f9806f8
R13: 0000000000000001 R14: 00000000ffffffff R15: 0000000000000000
FS:  00002aaaaaabbdb0(0000) GS:ffff81012fcd7e40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000118629000 CR4: 00000000000006e0
Process bash (pid: 4451, threadinfo ffff810115ed8000, task ffff81012f9ea080)
Stack:  0000000000000001 00000000ffffffff 0000000000000000 ffffffff8807bea2
 2f9806d8ffffffd8 ffff81012f980698 00000000ffffffff 00000000ffffffff
 ffff81010deea000 00000000ffffffff ffff81012592fd80 ffffffff880f6f75
Call Trace:
 [<ffffffff8807bea2>] :scsi_mod:scsi_scan_target+0x4e/0x83
 [<ffffffff880f6f75>] :scsi_transport_fc:fc_user_scan+0x55/0x85
 [<ffffffff8807c808>] :scsi_mod:store_scan+0x9b/0xc5
 [<ffffffff800fa3a4>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016121>] vfs_write+0xce/0x174
 [<ffffffff800169b2>] sys_write+0x45/0x6e
 [<ffffffff8005b2c1>] tracesys+0xd1/0xdc


Code: f0 ff 0f 0f 88 8d 01 00 00 59 5e 5b c3 41 54 55 48 89 fd 53
RIP  [<ffffffff80061625>] mutex_lock+0x10/0x1d
 RSP <ffff810115ed9dd8>
CR2: 0000000000000060
 <0>Kernel panic - not syncing: Fatal exception


Expected results:
lun scan complete with no panics.

Additional info:

Comment 1 Ernie Petrides 2007-09-11 22:13:58 UTC

This problem has been fixed in RHEL5.1 with the fix for bug 246023.

Don, the relevant patch tracking file is:

  scsi_tranport_fc-check-portstates-before-invoking-target-scan.patch

*** This bug has been marked as a duplicate of 246023 ***