Bug 741273 - Non-responsive scsi target leads to excessive scsi recovery and dm-mp failover time [rhel-5.7.z]
Summary: Non-responsive scsi target leads to excessive scsi recovery and dm-mp failove...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Phillip Lougher
QA Contact: Gris Ge
URL:
Whiteboard:
Depends On: 694625
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-26 13:19 UTC by RHEL Program Management
Modified: 2013-01-11 04:03 UTC (History)
12 users (show)

Fixed In Version: kernel-2.6.18-274.11.1.el5
Doc Type: Bug Fix
Doc Text:
In error recovery, most SCSI error recovery stages send a TUR (Test Unit Ready) command for every bad command when a driver error handler reports success. When several bad commands pointed to a same device, the device was probed multiple times. When the device was in a state where it did not respond to commands even after a recovery function returned success, the error handler had to wait for the commands to time out. This significantly impeded the recovery process. With this update, SCSI mid-layer error routines to send test commands have been fixed to respond once per device instead of once per bad command, thus reducing error recovery time considerably.
Clone Of:
Environment:
Last Closed: 2011-11-29 14:36:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1479 0 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2011-11-29 19:25:05 UTC

Description RHEL Program Management 2011-09-26 13:19:08 UTC
This bug has been copied from bug #694625 and has been proposed
to be backported to 5.7 z-stream (EUS).

Comment 4 Phillip Lougher 2011-11-08 16:38:25 UTC
in kernel-2.6.18-274.11.1.el5

linux-2.6-scsi-reduce-error-recovery-time-by-reducing-use-of-turs.patch

Comment 6 Gris Ge 2011-11-23 10:26:23 UTC
Cannot reproduce this problem on RHEL 5.7 GA kernel.

scsi_debug downloaded from http://lacrosse.corp.redhat.com/~fge/scsi_debug/
=======
[root@intel-canoepass-02 ~]# modprobe scsi_debug dev_size_mb=100 opts=4 
[root@intel-canoepass-02 ~]# date
Wed Nov 23 05:03:52 EST 2011
[root@intel-canoepass-02 ~]# echo -1 >
/sys/module/scsi_debug/parameters/every_nth
[root@intel-canoepass-02 ~]# dd if=/dev/sdb of=/dev/null count=1 iflag=direct

dd: reading `/dev/sdb': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 120.01 seconds, 0.0 kB/s
[root@intel-canoepass-02 ~]# date
Wed Nov 23 05:05:52 EST 2011
========

Got same results on kernel-2.6.18-274.11.1.el5.


Code reviewed, patch found in kernel-2.6.18-274.11.1.el5

Comment 7 errata-xmlrpc 2011-11-29 14:36:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1479.html

Comment 8 Martin Prpič 2011-11-29 17:58:49 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In error recovery, most SCSI error recovery stages send a TUR (Test Unit Ready) command for every bad command when a driver error handler reports success. When several bad commands pointed to a same device, the device was probed multiple times. When the device was in a state where it did not respond to commands even after a recovery function returned success, the error handler had to wait for the commands to time out. This significantly impeded the recovery process. With this update, SCSI mid-layer error routines to send test commands have been fixed to respond once per device instead of once per bad command, thus reducing error recovery time considerably.


Note You need to log in before you can comment on or make changes to this bug.