Bug 691945
Summary: | Non-responsive scsi target leads to excessive scsi recovery and dm-mp failover time | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Wysochanski <dwysocha> | ||||
Component: | kernel | Assignee: | Mike Christie <mchristi> | ||||
Status: | CLOSED ERRATA | QA Contact: | Storage QE <storage-qe> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.0 | CC: | akarlsso, amark, bdonahue, bubrown, dhoward, djeffery, dwysocha, fge, kzhang, mgoodwin, plyons, rdassen, soft-linux-drv | ||||
Target Milestone: | rc | Keywords: | Reopened, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.32-171.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
In error recovery, most SCSI error recovery stages send a TUR (Test Unit Ready) command for every bad command when a driver error handler reports success. When several bad commands pointed to a same device, the device was probed multiple times. When the device was in a state where the device did not respond to commands even after a recovery function returned success, the error handler had to wait for the commands to time out. This significantly impeded the recovery process. With this update, SCSI mid-layer error routines to send test commands have been fixed to respond once per device instead of once per bad command, thus reducing error recovery time considerably.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 694625 (view as bug list) | Environment: | |||||
Last Closed: | 2011-12-06 12:47:24 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 672437, 694625, 744811, 767187, 833603, 846704, 848463 | ||||||
Attachments: |
|
Description
Dave Wysochanski
2011-03-29 22:05:59 UTC
Created attachment 488905 [details]
Reduce # of turs sent during scsi error recovery
Attached is a RHEL6 version of the patch which as be submitted (but not yet accepted) upstream.
*** This bug has been marked as a duplicate of bug 672437 *** Re-opening to track this specific case separately, as BZ 672437 is more general. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Thanks for your work on this David. Patch was sent to rh-kernel for review and merging. Patch(es) available on kernel-2.6.32-171.el6 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: In error recovery, most SCSI error recovery stages send a TUR (Test Unit Ready) command for every bad command when a driver error handler reports success. When several bad commands pointed to a same device, the device was probed multiple times. When the device was in a state where the device did not respond to commands even after a recovery function returned success, the error handler had to wait for the commands to time out. This significantly impeded the recovery process. With this update, SCSI mid-layer error routines to send test commands have been fixed to respond once per device instead of once per bad command, thus reducing error recovery time considerably. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html *** Bug 631765 has been marked as a duplicate of this bug. *** |