Red Hat Bugzilla – Bug 431365
SCSI IO errors do not propagate properly with certain SCSI devices
Last modified: 2009-01-20 15:00:21 EST
Description of problem:
The Linux SCSI mid-layer, for kernels >= 2.6.18, does not handle SCSI disks that
incorrectly report hardware or medium errors.
Version-Release number of selected component (if applicable):
The specific scenario I recently chased involved using MD raid1 with 2 SCSI
disks that were connected via an Adaptec aacraid controller. Physically pulling
one of the disks from the enclosure did _not_ result in the block layer (MD
raid1) getting an IO error. The reason was the SCSI mid-layer was dropping the
error on the floor because it was calculating 'good_bytes' to be non-zero for
this HARDWARE_ERROR condition.
Even with "out-of-spec" SCSI disks, the SCSI mid-layer can takes steps to feel
the HARDWARE_ERROR and should keep 'good_bytes' set to 0. The result is
scsi_end_request propagates uptodate=0 to the block layer.
This regression in the SCSI mid-layer affects all kernels >= 2.6.18; this
includes all RHEL5 kernels AFAIK (definitely RHEL5U1). It is my hope that a fix
can be included in >= RHEL5U2.
To assist in that end, upstream has identified an appropriate fix (this is also
the candidate fix for all "stable" kernels) available here:
The fix is queued in the scsi-misc-2.6 tree for Linus:
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
We considered putting this fix in 5.2, even though it arrived after beta
code-freeze. Code review brought up another nearby, but different, issue. This
resulted in some delay. In then end, we decided that the risk associated with
changing I/O completion code without a full beta test (and plenty of soak time
upstream), is not worth the benefit for this issue in 5.2. We intend to fix
this in 5.3.
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.