Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 624394

Summary: [NetApp 5.6 bug] Kernel panic hit on RHEL 5.5 Emulex FC host at lpfc_abort_handler
Product: Red Hat Enterprise Linux 5 Reporter: Martin George <marting>
Component: kernelAssignee: plyons
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.5.zCC: andriusb, coughlan, dick.kennedy, jkurik, jpirko, jwest, laurie.barry, msnitzer, revers, vaios.papadimitriou, xdl-redhat-bugzilla
Target Milestone: rcKeywords: OtherQA, Regression
Target Release: 5.6   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-15 19:02:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 639028    
Bug Blocks: 557597, 633231    
Attachments:
Description Flags
Kernel config
none
Mike Snitzer's ALUA handler debug patch
none
Patch for 624393 none

Description Martin George 2010-08-16 10:06:12 UTC
Description of problem:
Hit a kernel panic on a RHEL 5.5 host with an Emulex 8G FC adapter (LPe12002) while running IO with target controller faults:

Kernel BUG at drivers/scsi/lpfc/lpfc_scsi.c:2206
invalid opcode: 0000 [1] SMP 
last sysfs file: /block/dm-14/dev
CPU 3 
Modules linked in: nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth
lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr
iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i iw_cxgb3 ib_core
cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi
video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery
asus_acpi acpi_memhotplug ac parport_pc lp parport sg floppy tg3 pcspkr
i2c_i801 i2c_core e752x_edac edac_mc ide_cd serio_raw cdrom dm_raid45
dm_message dm_region_hash dm_mem_cache dm_round_robin dm_multipath scsi_dh_rdac
scsi_dh_emc scsi_dh_alua scsi_dh dm_snapshot dm_zero dm_mirror dm_log dm_mod
ata_piix libata shpchp lpfc scsi_transport_fc sd_mod scsi_mod ext3 jbd uhci_hcd
ohci_hcd ehci_hcd
Pid: 438, comm: scsi_eh_0 Not tainted 2.6.18-194.11.1.el5.alua_dbg #1
RIP: 0010:[<ffffffff880ff793>]  [<ffffffff880ff793>]
:lpfc:lpfc_abort_handler+0x58/0x33d
RSP: 0018:ffff81007e705dd0  EFLAGS: 00010246
RAX: ffff81003531a680 RBX: ffff81003531a680 RCX: ffff81007e705e90
RDX: ffff81007e705e90 RSI: ffff81003531a698 RDI: ffff81007e6eb050
RBP: ffff81007e624000 R08: ffff81007e704000 R09: 000000000000003c
R10: ffff810002390a90 R11: ffffffff880ff73b R12: 0000000000000000
R13: 0000000000000282 R14: ffff81007e401b58 R15: ffffffff800a07c0
FS:  0000000000000000(0000) GS:ffff8100026ca6c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b081fcca1d0 CR3: 0000000013601000 CR4: 00000000000006e0
Process scsi_eh_0 (pid: 438, threadinfo ffff81007e704000, task
ffff81007f949100)
Stack:  ffff81003531a680 ffff81007e6eb000 ffff81007e6eb4f8 000020023b9aca00
 ffff810000000000 0000000300000001 ffff81007e705e00 ffff81007e705e00
 0000958c9102002a 0000000000000018 ffff810000000001 ffff81007e705e28
Call Trace:
 [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880791a4>] :scsi_mod:scsi_error_handler+0x290/0x4ac
 [<ffffffff88078f14>] :scsi_mod:scsi_error_handler+0x0/0x4ac
 [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003287b>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003277d>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 0f 0b 68 dd e0 11 88 c2 9e 08 4d 8b 7c 24 10 4c 3b 3c 24 0f 
RIP  [<ffffffff880ff793>] :lpfc:lpfc_abort_handler+0x58/0x33d
 RSP <ffff81007e705dd0>
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
2.6.18-194.11.1.el5
lpfc driver v8.2.0.63.3p

How reproducible:
Intermittent

Additional info:
The above RHEL 5.5 errata kernel (source package available at https://rhn.redhat.com/rhn/software/packages/details/Overview.do?pid=567665) was patched with Mike Snitzer's ALUA handler patch available at https://bugzilla.redhat.com/show_bug.cgi?id=619361#c1

Comment 1 Martin George 2010-08-16 10:08:18 UTC
Created attachment 438914 [details]
Kernel config

Comment 2 Martin George 2010-08-16 10:09:11 UTC
Created attachment 438915 [details]
Mike Snitzer's ALUA handler debug patch

Comment 3 Richard Kennedy 2010-08-17 19:27:13 UTC
I was able to download the source files and built lpfc from the kernel source tree. The driver is failing because it's abort_handler is being called from the mid-layer and the host_scribble area of the command has been nulled out.

 2191 lpfc_abort_handler(struct scsi_cmnd *cmnd)
   2192 {
   2193         struct Scsi_Host  *shost = cmnd->device->host;
   2194         struct lpfc_vport *vport = (struct lpfc_vport *) shost->hostdata        ;
   2195         struct lpfc_hba   *phba = vport->phba;
   2196         struct lpfc_iocbq *iocb;
   2197         struct lpfc_iocbq *abtsiocb;
   2198         struct lpfc_scsi_buf *lpfc_cmd;
   2199         IOCB_t *cmd, *icmd;
   2200         int ret = SUCCESS;
   2201         DECLARE_WAIT_QUEUE_HEAD_ONSTACK(waitq);
   2202
   2203
   2204         lpfc_block_error_handler(cmnd);
   2205         lpfc_cmd = (struct lpfc_scsi_buf *)cmnd->host_scribble;
   2206         BUG_ON(!lpfc_cmd);
   2207

Without host_scribble we cannot identify the IO to be aborted so we BUG out.
 
The driver only sets host_scribble when it successfully completes lpfc_queuecommand. If lpfc_queuecommand cannot issue the command it will return the command to the mid-layer with BUSY status. It does not set the host_scribble area in this case. 

We may need redhats help, could this be a race between the driver calling the done routine and the IO being aborted by the mid-layer? 

Does the ALUA code that is being debugged ever null out the host_scribble in the scsi_cmnd ?

Can we get a trace with the scsi debug level truned up?

Comment 4 Mike Snitzer 2010-08-19 01:14:19 UTC
(In reply to comment #3)

> We may need redhats help, could this be a race between the driver calling the
> done routine and the IO being aborted by the mid-layer? 

I'd be amazed if that were the case but I must defer to others to provide a more comprehensive answer.

From the trace, it is not clear to me which path was taken to call .eh_abort_handler

Was it scsi_eh_abort_cmds() or scsi_send_eh_cmnd()?

But scsi_error.c:scsi_try_to_abort_cmd states:
"scsi_done was called just after the command timed out and
we had a chance to process it."
Having documented that assertion it goes on to call .eh_abort_handler

> Does the ALUA code that is being debugged ever null out the host_scribble in
> the scsi_cmnd ?

No.  The patch from comment#2 traces state transitions in scsi_dh_alua and adds retry capability if TPGS_STATE_TRANSITIONING is seen in 2 code paths.  None of which would alter host_scribble -- which AFAIK is managed entirely by the LLD.

> Can we get a trace with the scsi debug level turned up?

That is a question for NetApp.  I've been doing this same NetApp controller failure testing with the 2.6.18-194.11.1.el5.alua_dbg kernel using lpfc and haven't had any issues.  Only NetApp can trigger these problems.

But chances are additional scsi debug would really alter the timing (given the flood of error handling and overall IO load performed during these NetApp controller failure tests)... is there some precise level you'd like used?

Comment 5 Rob Evers 2010-08-19 16:41:00 UTC
2 things:

1) Is it possible to attach the /var/log/messages file associated with this BUG_ON?

2) (In reply to comment #3)
> I was able to download the source files and built lpfc from the kernel source
> tree. The driver is failing because it's abort_handler is being called from the
> mid-layer and the host_scribble area of the command has been nulled out.
> 
>  2191 lpfc_abort_handler(struct scsi_cmnd *cmnd)
>    2192 {
>    2193         struct Scsi_Host  *shost = cmnd->device->host;
>    2194         struct lpfc_vport *vport = (struct lpfc_vport *)
> shost->hostdata        ;
>    2195         struct lpfc_hba   *phba = vport->phba;
>    2196         struct lpfc_iocbq *iocb;
>    2197         struct lpfc_iocbq *abtsiocb;
>    2198         struct lpfc_scsi_buf *lpfc_cmd;
>    2199         IOCB_t *cmd, *icmd;
>    2200         int ret = SUCCESS;
>    2201         DECLARE_WAIT_QUEUE_HEAD_ONSTACK(waitq);
>    2202
>    2203
>    2204         lpfc_block_error_handler(cmnd);
>    2205         lpfc_cmd = (struct lpfc_scsi_buf *)cmnd->host_scribble;
>    2206         BUG_ON(!lpfc_cmd);
>    2207
> 
> Without host_scribble we cannot identify the IO to be aborted so we BUG out.
> 
> The driver only sets host_scribble when it successfully completes
> lpfc_queuecommand. If lpfc_queuecommand cannot issue the command it will return

There are a few paths in the lpfc_queuecommand that can result in the function returning without setting the cmnd->host_scribble, see snippet of lpfc_queuecommand below.  Is there any chance that the command that ends up in the abort handler was returned from lpfc_queuecommand without the host_scribble field being set?  If so, should the abort handler be able to handle this condition?

	err = fc_remote_port_chkready(rport);
	if (err) {
		cmnd->result = err;
		sdev = cmnd->device;
		if ((phba->link_state == LPFC_HBA_ERROR) &&
			(cmnd->result == ScsiResult(DID_NO_CONNECT, 0)) &&
			(sdev->queue_depth != 1)) {

			/* If we reach this point, the HBA is not responding
			 * and has been taken offline. If alot of SCSI IO
			 * was active, this could cause a massive amount of
			 * SCSI layer initiated logging. On some systems,
			 * this massive amount of logging has been known to
			 * cause CPU soft lockups. In an attempt to throttle
			 * the amount of logging, set the sdev queue depth to 1.
			 */
			if (sdev->ordered_tags)
				scsi_adjust_queue_depth(sdev,
					MSG_ORDERED_TAG, 1);
			else
				scsi_adjust_queue_depth(sdev,
					MSG_SIMPLE_TAG, 1);
		}
		goto out_fail_command;
	}
	ndlp = rdata->pnode;

	/*
	 * Catch race where our node has transitioned, but the
	 * transport is still transitioning.
	 */
	if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) {
		cmnd->result = ScsiResult(DID_TRANSPORT_DISRUPTED, 0);
		goto out_fail_command;
	}
	if (vport->cfg_max_scsicmpl_time &&
		(atomic_read(&ndlp->cmd_pending) >= ndlp->cmd_qdepth))
		goto out_host_busy;

	lpfc_cmd = lpfc_get_scsi_buf(phba);
	if (lpfc_cmd == NULL) {
		lpfc_rampdown_queue_depth(phba);

		lpfc_printf_vlog(vport, KERN_INFO, LOG_FCP,
				 "0707 driver's buffer pool is empty, "
				 "IO busied\n");
		goto out_host_busy;
	}

	/*
	 * Store the midlayer's command structure for the completion phase
	 * and complete the command initialization.
	 */
	lpfc_cmd->pCmd  = cmnd;
	lpfc_cmd->rdata = rdata;
	lpfc_cmd->timeout = 0;
	lpfc_cmd->start_time = jiffies;
	cmnd->host_scribble = (unsigned char *)lpfc_cmd;

Comment 6 Martin George 2010-08-24 14:09:03 UTC
(In reply to comment #5)
> 1) Is it possible to attach the /var/log/messages file associated with this
> BUG_ON?
> 

I can try that. But this is highly intermittent - I've just hit this once so far in my tests.

Comment 7 Rob Evers 2010-09-01 15:32:26 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > 1) Is it possible to attach the /var/log/messages file associated with this
> > BUG_ON?
> > 
> 
> I can try that. But this is highly intermittent - I've just hit this once so
> far in my tests.

Is it possible to configure kdump (and test the kdump configuration under load
esp. if boot from SAN) while running the scenario that induced this problem?

A dump file would provide much information valuable to diagnosing this problem.

Comment 8 Martin George 2010-09-13 11:05:49 UTC
Hit a similar bug on RHEL6 described at bug 633231

Comment 10 Richard Kennedy 2010-10-07 16:58:31 UTC
Created attachment 452159 [details]
Patch for 624393

Comment 11 Rob Evers 2010-10-07 19:06:59 UTC
(In reply to comment #10)
> Created attachment 452159 [details]
> Patch for 624393

Was this patch tested after the problem was reproduced reliably or were some other means used to confirm its legitimacy?

Comment 12 Martin George 2010-10-27 12:38:53 UTC
(In reply to comment #10)
> Created attachment 452159 [details]
> Patch for 624393

Not hit the panic so far in our tests with the above patch.

Comment 15 Martin George 2010-11-02 18:39:43 UTC
Is this patch being queued for the next 5.5.z release?

Comment 16 Rob Evers 2010-11-02 19:49:29 UTC
(In reply to comment #15)
> Is this patch being queued for the next 5.5.z release?

Yes, but still in progress.

Comment 17 Rob Evers 2010-11-03 13:56:17 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Created attachment 452159 [details] [details]
> > Patch for 624393
> 
> Was this patch tested after the problem was reproduced reliably or were some
> other means used to confirm its legitimacy?

(In reply to comment #12)
> (In reply to comment #10)
> > Created attachment 452159 [details] [details]
> > Patch for 624393
> 
> Not hit the panic so far in our tests with the above patch.

Clearing needinfo due to patch being part of rhel5.6 update, and testing results by Martin George.

Thanks, Rob

Comment 22 Andrius Benokraitis 2010-11-15 14:31:49 UTC
Miroslav, this is a 5.5.z targeted bugzilla that contains a discrete patch from a wholesale driver update in bug 639028.

Comment 24 Andrius Benokraitis 2010-11-15 19:02:07 UTC
Closing this bugzilla as a dupe of 639028, and then will request z-stream from there with the specific patch.

*** This bug has been marked as a duplicate of bug 639028 ***