197158 – mptscsi adds DID_BUS_BUSY host status when scsi status of BUSY is returned

Bug 197158 - mptscsi adds DID_BUS_BUSY host status when scsi status of BUSY is returned

Summary: mptscsi adds DID_BUS_BUSY host status when scsi status of BUSY is returned

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Chip Coldwell
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:	228108
Blocks:	217097 246621
TreeView+	depends on / blocked

Reported:	2006-06-28 22:11 UTC by tom phelan
Modified:	2018-10-27 12:58 UTC (History)
CC List:	17 users (show)
Fixed In Version:	RHBA-2007-0304
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-05-08 02:20:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The hunk in question (811 bytes, patch) 2006-08-09 17:42 UTC, Chip Coldwell	no flags	Details \| Diff
Patch made against scsi-misc to (1) create new scsi host status DID_COND_REQUEUE and (2) return this status in mptscsih.c iodone handler detects scsi status of SAM_STAT_BUSY. (1.64 KB, patch) 2007-01-26 18:12 UTC, Ed Goggin	no flags	Details \| Diff
touchup of the previous patch to apply to the RHEL-4 kernel. (1.76 KB, patch) 2007-01-30 18:33 UTC, Chip Coldwell	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0304	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5	2007-04-28 18:58:50 UTC

Internal Links: 228108

Description tom phelan 2006-06-28 22:11:39 UTC

Description of problem:

The mptscsi driver in linux version 2.6.9-34 changes a SCSI
return status of scsi busy into a SCSI status of host busy and 
scsi (target) busy before returning the status to the linux
I/O stack. 


Version-Release number of selected component (if applicable):
The code change to the mptscsi driver that appends the DID_BUS_BUSY 
status to the SCSI  status was made between linux versions 2.6.9-22 
and 2.6.9-34.


How reproducible:
100% reproducible


Steps to Reproduce:
1. Create a condition where the LSI hardware returns a scsi 
   status of MPI_SCSI_STATUS_BUSY to the driver. This is easy to
   do runnning RHEL 4 AS U3 as a virtual machine with the VMware ESX product.
2. The mptscsi driver converts the scsi status of MPI_SCSI_STATUS_BUSY into
   a host status of DID_BUS_BUSY and a scsi status of MPI_SCSI_STATUS_BUSY.
3. When the SCSI I/O command is returned to the Linux SCSI midlayer with a
   host status of DID_BUS_BUSY, the command will be retried five
   times and then if the target continues to return a status of BUSY the 
   I/O command will be marked as failed. If the SCSI I/O
   command originated within the ext3 file system, the I/O failure will cause
   the ext3 file system to be marked as read-only.

This behavior is different from the mptscsi driver in linux version 2.6.9-22.
In the version *22, the mptscsi driver did not append the host status of 
DID_BUS_BUSY to a SCSI command failure of MPI_SCSI_STATUS_BUSY. Without the 
host status of DID_BUS_BUSY, the Linux SCSI midlayer will retry the failed 
I/O command an unlimited number of times. In the case of the ext3 file system,
a brief period of busy status from the SCSI target would not cause the
file system to be marked read-only.

  
Actual results:
  When an LSI SCSI device returns a status of target busy for a very 
  brief period of time, I/Os will fail after a few retries.


Expected results:
  When an LSI SCSI device returns a status of target busy for a very 
  brief period of time, I/Os will be retried.

Additional info:

The problematic change to the mptscsi.c file is in the
mptscsih_io_done() routine:

case MPI_IOCSTATUS_SUCCESS:			   /* 0x0000 */
   if (scsi_status == MPI_SCSI_STATUS_BUSY)
      sc->result = (DID_BUS_BUSY << 16) | scsi_status;  
   else
      sc->result = (DID_OK << 16) | scsi_status;
   	
This modification was clearly made for a reason, but it may to be
out of compliance with the LSI device specificiation. It is unexpected that
a scsi status of  BUSY would be treated the same as a host status of busy.

Comment 2 Chip Coldwell 2006-08-09 17:42:14 UTC

Created attachment 133859 [details]
The hunk in question

This shows the difference between the RHEL-4-U2 and RHEL-4-U3 versions that is
in question here.  It certainly looks like the change was intentional.

Chip

Comment 3 Chip Coldwell 2006-08-09 18:12:16 UTC

From coldwell Wed Aug  9 14:20:52 2006
Date: Wed, 9 Aug 2006 14:20:57 -0400 (EDT)
From: Chip Coldwell <coldwell>
To: Eric.Moore
Subject: DID_BUS_BUSY semantics

Hi Eric,

I'm a programmer in the kernel storage group at Red Hat, looking into
an issue that was raised by another one of our partners concerning a
change that was made between versions 3.02.18 (RHEL-4 update 2) and
3.02.62.01rh (RHEL-4 update 3) of the mptscsi driver.  The hunk in
question looks like this:

--- RHEL-4-U2/kernel-2.6.9/linux-2.6.9/drivers/message/fusion/mptscsih.c
2006-08-09 13:34:48.000000000 -0400
+++ RHEL-4-U3/kernel-2.6.9/linux-2.6.9/drivers/message/fusion/mptscsi.c
2006-08-09 13:32:48.000000000 -0400
@@ -800,10 +769,14 @@ mptscsih_io_done(MPT_ADAPTER *ioc, MPT_F

 			break;

+		case MPI_IOCSTATUS_SCSI_DATA_OVERRUN:		/* 0x0044 */
+			sc->resid=0;
 		case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:	/* 0x0040 */
 		case MPI_IOCSTATUS_SUCCESS:			/* 0x0000 */
-			scsi_status = pScsiReply->SCSIStatus;
-			sc->result = (DID_OK << 16) | scsi_status;
+			if (scsi_status == MPI_SCSI_STATUS_BUSY)
+				sc->result = (DID_BUS_BUSY << 16) |
scsi_status;
+			else
+				sc->result = (DID_OK << 16) | scsi_status;
 			if (scsi_state == 0) {
 				;
 			} else if (scsi_state &
MPI_SCSI_STATE_AUTOSENSE_VALID) {

The question here is whether it is correct to return DID_BUS_BUSY
instead of DID_OK when the target reports BUSY.  In the old version,
the driver returned (DID_OK << 16) | MPI_SCSI_STATUS_BUSY; in the new
version it returns (DID_BUS_BUSY << 16) | MPI_SCSI_STATUS_BUSY.  Can
you shed any light?

Thanks,

Chip

-- 
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

Comment 4 Chip Coldwell 2006-09-05 18:59:42 UTC

From Eric.Moore Tue Sep  5 14:36:16 2006
Date: Tue, 5 Sep 2006 12:36:04 -0600
From: "Moore, Eric" <Eric.Moore>
To: Chip Coldwell <coldwell>
Subject: RE: DID_BUS_BUSY semantics

On Wednesday, August 09, 2006 12:21 PM, Chip Coldwell wrote: 

> Hi Eric,
> 
> I'm a programmer in the kernel storage group at Red Hat, looking into
> an issue that was raised by another one of our partners concerning a
> change that was made between versions 3.02.18 (RHEL-4 update 2) and
> 3.02.62.01rh (RHEL-4 update 3) of the mptscsi driver.  The hunk in
> question looks like this:
> 
> --- 
> RHEL-4-U2/kernel-2.6.9/linux-2.6.9/drivers/message/fusion/mpt
> scsih.c	2006-08-09 13:34:48.000000000 -0400
> +++ 
> RHEL-4-U3/kernel-2.6.9/linux-2.6.9/drivers/message/fusion/mp
> tscsi.c	2006-08-09 13:32:48.000000000 -0400
> @@ -800,10 +769,14 @@ mptscsih_io_done(MPT_ADAPTER *ioc, MPT_F
> 
>   			break;
> 
> +		case MPI_IOCSTATUS_SCSI_DATA_OVERRUN:		
> /* 0x0044 */
> +			sc->resid=0;
>   		case MPI_IOCSTATUS_SCSI_RECOVERED_ERROR:	
> /* 0x0040 */
>   		case MPI_IOCSTATUS_SUCCESS:			
> /* 0x0000 */
> -			scsi_status = pScsiReply->SCSIStatus;
> -			sc->result = (DID_OK << 16) | scsi_status;
> +			if (scsi_status == MPI_SCSI_STATUS_BUSY)
> +				sc->result = (DID_BUS_BUSY << 
> 16) | scsi_status;
> +			else
> +				sc->result = (DID_OK << 16) | 
> scsi_status;
>   			if (scsi_state == 0) {
>   				;
>   			} else if (scsi_state & 
> MPI_SCSI_STATE_AUTOSENSE_VALID) {
> 
> 
> The question here is whether it is correct to return DID_BUS_BUSY
> instead of DID_OK when the target reports BUSY.  In the old version,
> the driver returned (DID_OK << 16) | MPI_SCSI_STATUS_BUSY; in the new
> version it returns (DID_BUS_BUSY << 16) | MPI_SCSI_STATUS_BUSY.  Can
> you shed any light?
> 
> Thanks,
> 

Sorry for the delay in responding.

Here is snip of release notes relevant to this change. It occurred in
the 3.02.18 release,
and was requested by Engenio, for fixing a problem with RDAC/MPP
failover driver.  



*	When a target device responds with BUSY status, the MPT driver
was sending DID_OK to the SCSI mid layer, which caused the IO to be
retried indefinitely between the mid layer and the driver.  By changing
the driver return status to DID_BUS_BUSY, the target BUSY status can now
flow through the mid layer to an upper layer Failover driver, which will
manage the I/O timeout.

Comment 5 Chip Coldwell 2006-09-05 19:06:00 UTC

(In reply to comment #4)
> From Eric.Moore Tue Sep  5 14:36:16 2006
> 
> Here is snip of release notes relevant to this change. It occurred in
> the 3.02.18 release,
> and was requested by Engenio, for fixing a problem with RDAC/MPP
> failover driver.  
> 
> *	When a target device responds with BUSY status, the MPT driver
> was sending DID_OK to the SCSI mid layer, which caused the IO to be
> retried indefinitely between the mid layer and the driver.  By changing
> the driver return status to DID_BUS_BUSY, the target BUSY status can now
> flow through the mid layer to an upper layer Failover driver, which will
> manage the I/O timeout. 

It seems that LSI Logic agrees with VMWare: this change prevents the 
mid-layer and driver from retrying commands indefinitely.  Unfortunately,
VMWare wants that, and Engenio does not.

Chip

Comment 6 Chip Coldwell 2006-10-02 14:19:58 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > From Eric.Moore Tue Sep  5 14:36:16 2006
> > 
> > Here is snip of release notes relevant to this change. It occurred in
> > the 3.02.18 release,
> > and was requested by Engenio, for fixing a problem with RDAC/MPP
> > failover driver.  
> > 
> > *	When a target device responds with BUSY status, the MPT driver
> > was sending DID_OK to the SCSI mid layer, which caused the IO to be
> > retried indefinitely between the mid layer and the driver.  By changing
> > the driver return status to DID_BUS_BUSY, the target BUSY status can now
> > flow through the mid layer to an upper layer Failover driver, which will
> > manage the I/O timeout. 
> 
> It seems that LSI Logic agrees with VMWare: this change prevents the 
> mid-layer and driver from retrying commands indefinitely.  Unfortunately,
> VMWare wants that, and Engenio does not.

Under the circumstances, we feel it would be best if VMware and LSI Logic
find a mutually acceptable solution to this problem, and Red Hat will
adopt whatever goes in the upstream kernel.

Chip


> 
> Chip
>

Comment 7 Ed Goggin 2006-12-07 13:22:48 UTC

Why not make DID_BUS_BUSY (or better yet, some new host status value like 
DID_NOMPIO_RETRY) command retries indefinite IFF REQ_FAILFAST is not set in the 
io request's cmd_flags?  These retries would not be impacted by the maximum 
command retry count.  This host status would also differ from DID_IMM_RETRY in 
that no retries would be done by scsi-ml if REQ_FAILFAST was set.  Using a new 
host status would also require changing mptscsih.c to use the new host status 
instead of DID_BUS_BUSY.

The approach also requires that io from a VMware ESX Server guest OS would not 
have REQ_FAILFAST set while io managed by the RDAC/MPP driver would have 
REQ_FAILFAST set.  The only Linux io which has REQ_FAILFAST set is MPIO and 
readahead requests.  Since there is no logical reason for a VMware ESX Server 
guest to run multi-pathing services (since MPIO is provided in by ESX Server 
itself), this restriction should be fine for them.  This approach may require 
changing the RDAC/MPP driver to set REQ_FAILFAST for their io, similar to the 
dm-multipath driver dm-mpath.c, if they don't do so already.

Comment 8 Mike Christie 2006-12-13 20:56:28 UTC

(In reply to comment #7)
> Why not make DID_BUS_BUSY (or better yet, some new host status value like 
> DID_NOMPIO_RETRY) command retries indefinite IFF REQ_FAILFAST is not set in the 
> io request's cmd_flags?  These retries would not be impacted by the maximum 
> command retry count.  This host status would also differ from DID_IMM_RETRY in 
> that no retries would be done by scsi-ml if REQ_FAILFAST was set.  Using a new 
> host status would also require changing mptscsih.c to use the new host status 
> instead of DID_BUS_BUSY.
> 
> The approach also requires that io from a VMware ESX Server guest OS would not 
> have REQ_FAILFAST set while io managed by the RDAC/MPP driver would have 
> REQ_FAILFAST set.  The only Linux io which has REQ_FAILFAST set is MPIO and 
> readahead requests.  Since there is no logical reason for a VMware ESX Server 
> guest to run multi-pathing services (since MPIO is provided in by ESX Server 
> itself), this restriction should be fine for them.  This approach may require 
> changing the RDAC/MPP driver to set REQ_FAILFAST for their io, similar to the 
> dm-multipath driver dm-mpath.c, if they don't do so already.

I agree with Ed's comments. It should really be up to the LSI RDAC/MPP driver to
set the failfast bit here. Last I looked it was doing some weird hook in to the
scsi mid layer but it should still be able to set the failfast bit.

I do not think we can change the behavior of DID_BUS_BUSY at this point so the
new host byte value sounds better or it might be better to take the case to
linux-scsi and see if we can change the bahavior of DID_IMM_RETRY. I think we
can change the behavior of DID_IMM_RETRY because it is used by ipr which is not
going to be used with something like multipath or raid so the failfast bit does
not really come into play in such a way that it changes what ipr wanted to use
it for. And for drivers like qla2xxx and drivers using the fc transport class
helpers, I think it would be beneficial if during periods where the rport is
transitioning to blocked or changing state that if failfast is set that we
returned the request to some layer like dm-multipath which can retry the request
on another path.

One question I had was if failfast is set and the status byte returns busy, does
it make sense to just check the failfast bit there? If it was set, then return
command to upper layer. If not then continue to retry the command in the scsi
layer. So here in scsi_error.c

        switch (status_byte(scmd->result)) {
        case QUEUE_FULL:
                /*
                 * the case of trying to send too many commands to a
                 * tagged queueing device.
                 */
        case BUSY:
                /*
                 * device can't talk to us at the moment.  Should only
                 * occur (SAM-3) when the task queue is empty, so will cause
                 * the empty queue handling to trigger a stall in the
                 * device.
                 */
                if (blk_noretry_request(scmd->request))
                      return SUCCESS; /* send to upper layers */
                else
                      return ADD_TO_MLQUEUE;


This is basically what we are doing with the new host byte or changing the
behavior of DID_IMM_RETRY if we use it when the status returns busy, but we do
not have to change every driver.

Comment 9 Mike Christie 2006-12-13 21:02:48 UTC

That last bit is not true. Some drivers return DID_BUS_BUSY when they get busy
status and others return DID_OK | status, so we have mixed bag :) Some drivers
would need to be converted and others would not but for once we could provide a
consistent behavior :) I think I have a bugzilla from NEC mentioning that they
do not like how we retry forver on busy status.

Comment 12 Andrius Benokraitis 2007-01-17 16:45:30 UTC

Question for Eric @ LSI:
Do you know if there was a patch submitted/approved upstream for this issue?
Although this bug was closed notabug, a fix is being requested from a mutual
customer.

Question for Chip/Mike:
Why was this closed as notabug?

Comment 13 Eric Moore 2007-01-17 17:04:40 UTC

Ed Goggin just pinged me this morning, and I've sent an email over to the 
Engenio wrt using the REQ_FAILFAST flag.   

Mike Christie - have you seen this thread on lsml?  I'm not clear where James 
Bottomley stands on this?  My take is he wants the driver merely to return the 
sam status as is, however that is not goign to work for mpp rdac, as you know.
Here's the thread in case you've not seen it:

http://marc.theaimsgroup.com/?l=linux-
scsi&w=2&r=1&s=vmware+bug+fix+prevent+inifinite+retries&q=b

Comment 15 Andrius Benokraitis 2007-01-18 16:17:33 UTC

Keeping this open until VMware and LSI Logic find a mutually acceptable solution
to this problem, and Red Hat will then adopt whatever goes in the upstream
kernel. This is looking more like a RHEL 4.6 issue given the discussions on
linux-scsi.

Comment 16 Eric Moore 2007-01-18 17:32:00 UTC

I'm awaiting feedback from enenio.

Comment 17 Andrius Benokraitis 2007-01-25 19:52:59 UTC

Ed (VMWare) - Any progress between you and LSI on this matter? Any later and
this will have to be a RHEL 4.6 issue.

Comment 18 Ed Goggin 2007-01-25 20:11:38 UTC

Andrius,

Engenio and LSI Logic both seem OK with the general idea of the approach and its
implementation for the RHEL 4 Update 5 or 6 release.

They are concerned about upstream code since the RDAC/MPP code uses
scsi_execute_async in upstream kernels instead of using scsi_do_req. 
Unfortunately, scsi_execute_async currently provides no way to set the cmd_flags
(and thereby the REQ_FAILFAST bit) for an IO request.

I've been assuming that Red Hat will require a solution for the problem
submitted upstream before the solution is fixed in RHEL 4.  Is that the case, or
can we simply work out a simpler patch now for RHEL 4 and one later for upstream?

Please advise.

Ed

Comment 19 Eric Moore 2007-01-26 17:12:41 UTC

Andrius - Ed Goggin is going to post a patch in this  bugzilla to address his 
and Engenio's MPP driver concerns for RHEL4.5.    We are not going to post 
upstream for this particular fix today, as you know, I'm trying to get patchs 
togeather mainstream with regard to RHEL5.1 stuff which is due on Monday.  
We can handle upstreams later next week, Ok?

Comment 20 Ed Goggin 2007-01-26 18:12:30 UTC

Created attachment 146697 [details]
Patch made against scsi-misc to (1) create new scsi host status DID_COND_REQUEUE and (2) return this status in mptscsih.c iodone handler detects scsi status of SAM_STAT_BUSY.

Created an attachment containing a patch for this problem.  The patch is built
against scsi-misc git tree.  This patch provides the framework for solving the
problem described in this bugzilla by (1) introducing a new host status called
DID_COND_REQUEUE, (2) adding code to scsi_decide_disposition to service a
DID_COND_REQUEUE host status by indefinitely requeuing the scsi cmnd IFF
REQ_FAILFAST is not set on the scsi cmnd's i/o request, otherwise returning
SCSI_SUCCESS, and (3) changing the mptscsih_io_done handler to return
DID_COND_REQUEUE instead of DID_BUSY_BUSY host status in the event of a
SAM_STAT_BUSY scsi status.

VMware guests, having no reason to run multi-pathing software in the guest OS,
will therefore requeue the i/o request indefinitely in this use case.  RDAC/MPP
multi-pathing software will be changed to set the REQ_FAILFAST bit of the i/o
requests it manages so that these i/o requests will not be requeued at all in
this use case.

Comment 22 Ed Goggin 2007-01-26 19:19:10 UTC

Andrius,

I've not yet tested the patch attached with comment #20.  While it appears
simple enough, nothing is beyond possible regression.  I was hoping that you
would be OK with us testing the patch in parallel with its submission.  I also
expect that both LSI Logic and Engenio have a vested interest in regression
testing this patch.

Comment 23 Andrius Benokraitis 2007-01-26 20:04:52 UTC

Ed, I will have to defer to Heath, who is a RH customer TAM that originally
experienced this problem. Maybe he or his customer can help test... Heath?

I'd also like Chip/Tom to comment with code review if possible...

Comment 25 Chip Coldwell 2007-01-26 20:15:56 UTC

(In reply to comment #23)
> Ed, I will have to defer to Heath, who is a RH customer TAM that originally
> experienced this problem. Maybe he or his customer can help test... Heath?
> 
> I'd also like Chip/Tom to comment with code review if possible...

I'm looking at the patch.  It seems that there are three cases that it is
important to test -- MPIO, no-MPIO, and VMWare guest.  Are we getting coverage
for all of them?

Chip

Comment 26 Andrius Benokraitis 2007-01-26 20:32:09 UTC

Chip - this BZ is only for RHEL 4, but I assume this patch would also have to be
proposed/backported for 5.1 as well, so yes, that may be a good idea to test.

Comment 27 Ed Goggin 2007-01-26 20:40:04 UTC

Chip,

Not sure why you think there are 3 test cases -- I think the last two overlap
since a VMware guest gets no value from running multipathing -- since the
underlying ESX micro-kernel provides multi-pathing and only exports a single
path to each logical unit.

If a VMware guest runs multi-pathing, this patch will actually make it worse for
the guest since the scsi cmnd will not be retransmitted at all whereas it is now
being retransmitted up to the scmd->allowed value.

Comment 28 Chip Coldwell 2007-01-26 21:06:50 UTC

(In reply to comment #27)
> 
> Not sure why you think there are 3 test cases -- I think the last two overlap
> since a VMware guest gets no value from running multipathing -- since the
> underlying ESX micro-kernel provides multi-pathing and only exports a single
> path to each logical unit.

Right, but I guess I don't want to close the bug until we confirm that the
VMware problem is fixed.  Otherwise, we're just making an arbitrary change for
no reason.

Chip

Comment 30 Ed Goggin 2007-01-29 20:03:44 UTC

Chip,

VMware will test the VMware guest use case you mention above in comment #25.  I
think this also covers the non-MPIO use case.

Could Engenio confirm that they will be testing the (RDAC/MPP) MPIO use case?

Ed

Comment 31 Chip Coldwell 2007-01-29 20:11:22 UTC

(In reply to comment #30)
> Chip,
> 
> VMware will test the VMware guest use case you mention above in comment #25.  I
> think this also covers the non-MPIO use case.
> 
> Could Engenio confirm that they will be testing the (RDAC/MPP) MPIO use case?

Actually, I think Engenio are an LSI customer; it might be up to LSI to test the
MPIO case.

Chip

Comment 32 Ed Goggin 2007-01-29 20:15:06 UTC

Chip,

Also, will Red Hat be providing patched binary versions of the MPT driver for
both RHEL Updates 3 and 4?  If so, will these be available on RHN?

Thanks,

Ed

Comment 33 Chip Coldwell 2007-01-29 20:18:32 UTC

(In reply to comment #32)
> Chip,
> 
> Also, will Red Hat be providing patched binary versions of the MPT driver for
> both RHEL Updates 3 and 4?  If so, will these be available on RHN?

If the testing goes well, the binary MPT driver that comes with RHEL4.5 (4.6?)
will include the patch.

I don't think RHEL3 has this problem, right?  It's been some time since we
touched the fusion driver in RHEL3.

Chip

Comment 34 Ed Goggin 2007-01-29 20:22:37 UTC

I'm referring to fixes for RHEL4 customers running (and wanting to stay on)
Update 3 or 4 (not wanting to upgrade to update 5 or 6).  Not even considering
RHEL3.

Is it looking like this will slide to RHEL4 U6?  Then having a more readily
available solution for these RHEL4 Update 3/4/5 customers will be even more
important.

Comment 35 Chip Coldwell 2007-01-29 20:28:47 UTC

(In reply to comment #34)
> I'm referring to fixes for RHEL4 customers running (and wanting to stay on)
> Update 3 or 4 (not wanting to upgrade to update 5 or 6).  Not even considering
> RHEL3.

I see.  That is a question above my pay grade -- Tom?

> Is it looking like this will slide to RHEL4 U6?

I don't know.  There are two issues with this patch: it isn't tested and it
isn't in the upstream kernel.  Since it touches the SCSI midlayer, that second
point is going to stir up some controversy in our internal engineering review
process.

Chip

Comment 36 Chip Coldwell 2007-01-29 20:51:06 UTC

Is something missing from this hunk?

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 2ecb6ff..59eb561 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1216,6 +1216,14 @@ int scsi_decide_disposition(struct scsi_
 	case DID_IMM_RETRY:
 		return NEEDS_RETRY;
 
+	case DID_COND_REQUEUE:
+		/*
+		 * Return immediately w/o requeue if the request 
+		 * indicates no retry.
+		 */
+		if (blk_noretry_request(scmd->request)) {
+			return SUCCESS;
+		}
 	case DID_REQUEUE:
 		return ADD_TO_MLQUEUE;
 
What if the host byte is DID_COND_RETRY and the REQ_FASTFAIL bit is cleared? 
Shouldn't we return NEEDS_RETRY?  I.e.

if (blk_noretry_request(scmd->request))
        return SUCCESS;
else
        return NEEDS_RETRY;

Chip

Comment 37 Ed Goggin 2007-01-29 21:09:14 UTC

Isn't that case covered by falling into the return ADD_TO_MLQUEUE for the
DID_REQUEUE host status?  If that coding is too subtle or dependent on ordering
of the particular host status handlers, we can simply change to

if (blk_noretry_request(scmd->request))
        return SUCCESS;
else
        return ADD_TO_MLQUEUE;

Comment 38 RHEL Program Management 2007-01-29 21:25:29 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 39 Eric Moore 2007-01-29 21:56:53 UTC

With regard to Comment #31.  Engenio is a division within LSI.  They are a 
seperate division from the one I'm in;  which currently is called SCG (Storage 
Components Group).   And SCG will be renamed soon as a result of the merger 
with Agere.  At one time they considered spinnig off Engenio from LSI, but 
that never happened after we got a new CEO a couple years ago.  This past 
year, the megaraid group was merged into Engenio.  I believe they are totally 
droping the Engenio name, and that division is part of LSI.

With regard to Comment #35.  I asked Ed to hold of a day or two, b'cuase I 
posted a major mpt fusion driver release on lsml, which occurred this 
morning.  I think in a couple days, we can post that.

With regard to Comment #36. I see no need to change the patch, b'cause doesn't 
NEEDS_RETRY translates to ADD_TO_MLQUEUE?

Comment 40 Ed Goggin 2007-01-29 22:43:57 UTC

They both add the command's request back to the device queue via
scsi_queue_insert but ADD_TO_MLQUEUE also sets the device_busy count (defaults
to 3) in order to stall the device for a bit.

I thought returning ADD_TO_MLQUEUE instead of NEEDS_RETRY might make it easier
to get the patch accepted upstream since I though that James Bottomley was in
favor of treating this simply as SAM_STAT_BUSY -- and the SAM_STAT_BUSY handler
returns ADD_TO_MLQUEUE.

Comment 41 Tom Coughlan 2007-01-30 01:47:18 UTC

(In reply to comment #34)
> I'm referring to fixes for RHEL4 customers running (and wanting to stay on)
> Update 3 or 4 (not wanting to upgrade to update 5 or 6).  Not even considering
> RHEL3.

We generally do not release patches for older versions, once they have been
superseded. Customers may be able to work out an exception to this rule by
working with their Red Hat support representative.

Comment 42 Chip Coldwell 2007-01-30 18:16:18 UTC

(In reply to comment #37)
> Isn't that case covered by falling into the return ADD_TO_MLQUEUE for the
> DID_REQUEUE host status?  If that coding is too subtle or dependent on ordering
> of the particular host status handlers, we can simply change to
> 
> if (blk_noretry_request(scmd->request))
>         return SUCCESS;
> else
>         return ADD_TO_MLQUEUE;

The issue is that the fall-through falls through to a different case in RHEL-4.
 The next case below DID_IMM_RETRY in RHEL-4 is DID_ERROR, not DID_REQUEUE. 
I'll touch up the patch so it applies to RHEL-4.

Chip

Comment 43 Chip Coldwell 2007-01-30 18:33:34 UTC

Created attachment 146944 [details]
touchup of the previous patch to apply to the RHEL-4 kernel.

Comment 44 Ed Goggin 2007-01-30 20:41:06 UTC

OK.  Updated patch looks good.  Thanks!

Comment 46 Jason Baron 2007-02-01 19:34:49 UTC

committed in stream U5 build 45. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/

Comment 47 Ed Goggin 2007-02-05 17:06:34 UTC

This whole approach (see patch from comment #20) has been rejected upstream due
to the fact that since 2.6.14, infinite retry for even BUSY and QUEUE_FULL scsi
status errors is prevented by retry limiting code in scsi_softirq_done (or
scsi_softirq).  As a result, a reasonable course of action for the RDAC/MPP
driver is for the fusion MPT driver to simply not interpret the scsi status when
ioc status is success (as was the case before the patch to return DID_BUS_BUSY
was applied) and for the RDAC/MPP driver to simply wait out the 180 second retry
limit before getting hold of the failed I/O.

Unfortunately, the affect on this use case for a VMware Linux guest OS is an I/O
failure after 180 seconds.

At the moment it is not clear what to do with the RHEL 4 Update 5 code.

Comment 48 Ed Goggin 2007-02-07 02:08:35 UTC

Is Red Hat open to porting the SCSI command retry limiting code in 
scsi_softirq_done from upstream kernels (2.6.14 at earliest) and changing the 
mpt fusion driver to no longer interpret the SCSI status in cases where ioc 
status is success in the RHEL4 U5 (or U6) release?  I'm suspecting that 
porting the retry limiting code will require a change to the SCSI command 
structure and thereby require changing the RHEL4 KBI.

Comment 50 Chip Coldwell 2007-02-07 18:53:12 UTC

(In reply to comment #48)
> Is Red Hat open to porting the SCSI command retry limiting code in 
> scsi_softirq_done from upstream kernels (2.6.14 at earliest)

This has kABI implications since it would change the size of the scsi_cmnd
struct (have to add a jiffies_at_alloc field to it).  I don't think it's an
option for 4.5 at this point.

Chip

Comment 53 Chip Coldwell 2007-02-08 20:16:56 UTC

We've decided to carry the patch attached to this bugzilla instead of trying to
backport the upstream code.  This seems like the safest way to fix the problem.

Chip

Comment 55 Tom Coughlan 2007-02-08 22:31:23 UTC

(In reply to comment #30)
> Chip,
> 
> VMware will test the VMware guest use case you mention above in comment #25.  I
> think this also covers the non-MPIO use case.
> 
> Could Engenio confirm that they will be testing the (RDAC/MPP) MPIO use case?

We are depending on Engenio (or EMC) to test the patch to this driver in RHEL 4.
This must include the problematic case in vmware, as well as regression testing
with and without vmware. Please post a summary of the testing that has been done
so far, as well as what you plan to do during Beta.

Comment 56 Chip Coldwell 2007-02-20 19:05:38 UTC

http://marc.theaimsgroup.com/?l=linux-scsi&m=117045392500159&w=2

Comment 57 Andrius Benokraitis 2007-02-22 20:25:02 UTC

RH and LSI will be leaning on VMWare to help test this, as we don't have the
right QA setups... Expect this to be added to a post-Beta snapshot of RHEL 4.5.

VMWare - can you lead the testing on this?

Comment 58 Ed Goggin 2007-02-27 22:19:56 UTC

VMware can commit to testing the original "problematic case in vmware" (see
comment #55) using the test kernel from comment #46 in order to verify that the
problem is fixed.  Is this sufficient for VMware's portion of the testing?

Comment 59 Chip Coldwell 2007-02-28 20:49:26 UTC

(In reply to comment #58)
> VMware can commit to testing the original "problematic case in vmware" (see
> comment #55) using the test kernel from comment #46 in order to verify that the
> problem is fixed.  Is this sufficient for VMware's portion of the testing?

Yes, thanks.

Chip

Comment 61 Andrius Benokraitis 2007-03-06 17:15:26 UTC

Needinfo from VMWARE: has this patch been tested with the RHEL 4.5 Beta?

Comment 62 Ed Goggin 2007-03-06 17:30:41 UTC

The RHEL 4.5 Beta test kernel is actively being tested by VMware SQA.  We hope 
to have feedback by the end of this week.  Do you need the feedback sooner?

Comment 63 Andrius Benokraitis 2007-03-06 17:33:35 UTC

Hi Ed, the sooner the better, we are already spinning post-Beta kernels and the
feedback period ends 13-Mar-07 for partners.

Comment 64 Ed Goggin 2007-03-08 13:11:02 UTC

Andrius,

VMware QA has acked this patch in the RHEL 4.5 Beta test kernel.

Thanks,

Ed

Comment 66 Mike Gahagan 2007-04-02 20:03:34 UTC

Patch is in the -52.EL kernel

Comment 68 Red Hat Bugzilla 2007-05-08 02:20:53 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Comment 69 Robert Westendorp 2007-05-08 20:52:50 UTC

I installed a new kernel (2.6.9-55.ELsmp) and when doing a restore from TSM we
still get the file system changed to Read-Only.
This is in a VM (ESX 3.0.1).

Comment 70 Andrius Benokraitis 2007-05-09 00:54:00 UTC

Robert - we relied exclusively on VMWare to test the patch prior to inclusion
into RHEL 4.5.

Ed - can you follow up with Robert? If this is still an issue we'll need to
create a new bugzilla that continues where this bug left off.

Comment 71 Ed Goggin 2007-05-10 13:55:39 UTC

Andrius, sorry for the delay ... I've been out for 3 days due to a family 
emergency.

Robert, could you describe your host/SAN configuration in detail.  I'm 
particularly interested in type of storage array, whether or not there is 
physical path redundancy, whether or not the problem is repeatable, how quickly 
the problem occurs.

Comment 72 The Written Word 2007-05-17 23:45:41 UTC

How about forward-porting this fix to RHEL5? The latest RHEL5 kernel exhibits
the same problem.

Comment 73 Chip Coldwell 2007-05-18 12:37:48 UTC

(In reply to comment #72)
> How about forward-porting this fix to RHEL5? The latest RHEL5 kernel exhibits
> the same problem.

Bug 228108 has already been opened to address this issue.

Comment 74 Eric Moore 2007-05-23 17:08:35 UTC

according to the other bugzilla(228108), the fix from Ed Goggin is already 
been applied in RHEL5 branch, (e.g. don't return DID_BUS_BUSY for 
SAM_STAT_BUSY).

Note You need to log in before you can comment on or make changes to this bug.