| Summary: | Non-responsive scsi target leads to excessive scsi recovery and dm-mp failover time [rhel-5.7.z] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | RHEL Program Management <pm-rhel> |
| Component: | kernel | Assignee: | Phillip Lougher <plougher> |
| Status: | CLOSED ERRATA | QA Contact: | Gris Ge <fge> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.5 | CC: | amark, anton, bubrown, ccui, dhoward, djeffery, dwysocha, fge, mchristi, mgoodwin, plyons, pm-eus |
| Target Milestone: | rc | Keywords: | Reopened, ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.18-274.11.1.el5 | Doc Type: | Bug Fix |
| Doc Text: |
In error recovery, most SCSI error recovery stages send a TUR (Test Unit Ready) command for every bad command when a driver error handler reports success. When several bad commands pointed to a same device, the device was probed multiple times. When the device was in a state where it did not respond to commands even after a recovery function returned success, the error handler had to wait for the commands to time out. This significantly impeded the recovery process. With this update, SCSI mid-layer error routines to send test commands have been fixed to respond once per device instead of once per bad command, thus reducing error recovery time considerably.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-11-29 14:36:25 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 694625 | ||
| Bug Blocks: | |||
|
Description
RHEL Program Management
2011-09-26 13:19:08 UTC
in kernel-2.6.18-274.11.1.el5 linux-2.6-scsi-reduce-error-recovery-time-by-reducing-use-of-turs.patch Cannot reproduce this problem on RHEL 5.7 GA kernel. scsi_debug downloaded from http://lacrosse.corp.redhat.com/~fge/scsi_debug/ ======= [root@intel-canoepass-02 ~]# modprobe scsi_debug dev_size_mb=100 opts=4 [root@intel-canoepass-02 ~]# date Wed Nov 23 05:03:52 EST 2011 [root@intel-canoepass-02 ~]# echo -1 > /sys/module/scsi_debug/parameters/every_nth [root@intel-canoepass-02 ~]# dd if=/dev/sdb of=/dev/null count=1 iflag=direct dd: reading `/dev/sdb': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 120.01 seconds, 0.0 kB/s [root@intel-canoepass-02 ~]# date Wed Nov 23 05:05:52 EST 2011 ======== Got same results on kernel-2.6.18-274.11.1.el5. Code reviewed, patch found in kernel-2.6.18-274.11.1.el5 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1479.html
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
In error recovery, most SCSI error recovery stages send a TUR (Test Unit Ready) command for every bad command when a driver error handler reports success. When several bad commands pointed to a same device, the device was probed multiple times. When the device was in a state where it did not respond to commands even after a recovery function returned success, the error handler had to wait for the commands to time out. This significantly impeded the recovery process. With this update, SCSI mid-layer error routines to send test commands have been fixed to respond once per device instead of once per bad command, thus reducing error recovery time considerably.
|