Bug 431365 - SCSI IO errors do not propagate properly with certain SCSI devices
SCSI IO errors do not propagate properly with certain SCSI devices
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
All Linux
high Severity urgent
: rc
: ---
Assigned To: Mike Christie
Martin Jenner
:
Depends On:
Blocks: KernelPrio5.3
  Show dependency treegraph
 
Reported: 2008-02-03 11:45 EST by Mike Snitzer
Modified: 2009-01-20 15:00 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:00:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mike Snitzer 2008-02-03 11:45:13 EST
Description of problem:
The Linux SCSI mid-layer, for kernels >= 2.6.18, does not handle SCSI disks that
incorrectly report hardware or medium errors.  


Version-Release number of selected component (if applicable):
2.6.18-53.el5


How reproducible:
Always

  
Actual results:
The specific scenario I recently chased involved using MD raid1 with 2 SCSI
disks that were connected via an Adaptec aacraid controller.  Physically pulling
one of the disks from the enclosure did _not_ result in the block layer (MD
raid1) getting an IO error.  The reason was the SCSI mid-layer was dropping the
error on the floor because it was calculating 'good_bytes' to be non-zero for
this HARDWARE_ERROR condition.


Expected results:
Even with "out-of-spec" SCSI disks, the SCSI mid-layer can takes steps to feel
the HARDWARE_ERROR and should keep 'good_bytes' set to 0.  The result is
scsi_end_request propagates uptodate=0 to the block layer.


Additional info:
This regression in the SCSI mid-layer affects all kernels >= 2.6.18; this
includes all RHEL5 kernels AFAIK (definitely RHEL5U1).  It is my hope that a fix
can be included in >= RHEL5U2.

To assist in that end, upstream has identified an appropriate fix (this is also
the candidate fix for all "stable" kernels) available here:
http://marc.info/?l=linux-scsi&m=120199000703534&w=4
Comment 1 Mike Snitzer 2008-02-03 15:59:10 EST
The fix is queued in the scsi-misc-2.6 tree for Linus:
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=b42abb39ca8a5414f039839866f0357725a53618
Comment 6 RHEL Product and Program Management 2008-02-06 14:58:47 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 9 RHEL Product and Program Management 2008-04-15 14:29:41 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 10 Tom Coughlan 2008-04-15 18:17:17 EDT
We considered putting this fix in 5.2, even though it arrived after beta
code-freeze. Code review brought up another nearby, but different, issue. This
resulted in some delay. In then end, we decided that the risk associated with
changing I/O completion code without a full beta test (and plenty of soak time
upstream), is not worth the benefit for this issue in 5.2.  We intend to fix
this in 5.3. 
Comment 11 RHEL Product and Program Management 2008-04-15 18:40:02 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 Don Zickus 2008-09-15 10:16:30 EDT
in kernel-2.6.18-115.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 15 errata-xmlrpc 2009-01-20 15:00:21 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.