431365 – SCSI IO errors do not propagate properly with certain SCSI devices

Bug 431365 - SCSI IO errors do not propagate properly with certain SCSI devices

Summary: SCSI IO errors do not propagate properly with certain SCSI devices

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Mike Christie
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	KernelPrio5.3
TreeView+	depends on / blocked

Reported:	2008-02-03 16:45 UTC by Mike Snitzer
Modified:	2009-01-20 20:00 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-01-20 20:00:21 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:0225	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update	2009-01-20 16:06:24 UTC

Description Mike Snitzer 2008-02-03 16:45:13 UTC

Description of problem:
The Linux SCSI mid-layer, for kernels >= 2.6.18, does not handle SCSI disks that
incorrectly report hardware or medium errors.  


Version-Release number of selected component (if applicable):
2.6.18-53.el5


How reproducible:
Always

  
Actual results:
The specific scenario I recently chased involved using MD raid1 with 2 SCSI
disks that were connected via an Adaptec aacraid controller.  Physically pulling
one of the disks from the enclosure did _not_ result in the block layer (MD
raid1) getting an IO error.  The reason was the SCSI mid-layer was dropping the
error on the floor because it was calculating 'good_bytes' to be non-zero for
this HARDWARE_ERROR condition.


Expected results:
Even with "out-of-spec" SCSI disks, the SCSI mid-layer can takes steps to feel
the HARDWARE_ERROR and should keep 'good_bytes' set to 0.  The result is
scsi_end_request propagates uptodate=0 to the block layer.


Additional info:
This regression in the SCSI mid-layer affects all kernels >= 2.6.18; this
includes all RHEL5 kernels AFAIK (definitely RHEL5U1).  It is my hope that a fix
can be included in >= RHEL5U2.

To assist in that end, upstream has identified an appropriate fix (this is also
the candidate fix for all "stable" kernels) available here:
http://marc.info/?l=linux-scsi&m=120199000703534&w=4

Comment 1 Mike Snitzer 2008-02-03 20:59:10 UTC

The fix is queued in the scsi-misc-2.6 tree for Linus:
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=b42abb39ca8a5414f039839866f0357725a53618

Comment 6 RHEL Program Management 2008-02-06 19:58:47 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 RHEL Program Management 2008-04-15 18:29:41 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Tom Coughlan 2008-04-15 22:17:17 UTC

We considered putting this fix in 5.2, even though it arrived after beta
code-freeze. Code review brought up another nearby, but different, issue. This
resulted in some delay. In then end, we decided that the risk associated with
changing I/O completion code without a full beta test (and plenty of soak time
upstream), is not worth the benefit for this issue in 5.2.  We intend to fix
this in 5.3.

Comment 11 RHEL Program Management 2008-04-15 22:40:02 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 Don Zickus 2008-09-15 14:16:30 UTC

in kernel-2.6.18-115.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 15 errata-xmlrpc 2009-01-20 20:00:21 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.