Red Hat Bugzilla – Bug 244968
Frequent path failures during I/O on DM multipath devices
Last modified: 2010-01-11 21:29:39 EST
Description of problem:
While I/O on dm multipath devices , we are seeing frequent path failures which
leads to unexpected I/O failover.
Snippet of syslog during failure :
scsi(2:1:16) UNDERRUN status detected 0x15-0x0.
resid=0x7fff8fff fw_resid=0x7fff8fff cdb=0x28
scsi(2:0:1:16) Dropped frame(s) detected
(7fff8fff of f400 bytes)...retrying command.
scsi(2:1:16) qla2x00_done: did_error = 2,
comp-scsi= 0x15-0x0 pid=102056310.
SCSI error : <2 0 1 16> return code = 0x20000
end_request: I/O error, dev sdbm, sector 4192702
end_request: I/O error, dev sdbm, sector 4192708
device-mapper: dm-multipath: Failing path 68:0.
As per our understanding, We are seeing paths marked as failed for which it
returns the status as DID_BUS_BUSY. What we understand here is, since IO's on
multipath devices have BIO_RW_FAILFAST set (hence REQ_FASTFAIL ), retries are
not allowed at SCSI mid layer for errors such as QUEUEFULL, UNDERRUN..(as
captured in the above syslog snippet) and so on. Is there any way to override
this BIO_RW_FAILFAST for retries to happen in order to avoid unexpected path
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Present atleast 50 device(Logical Units) with 8 paths to the Host
2. Start I/O on those 50 deivices.
3. Syslog captures "SCSI error" and "dm-multipath: Failing path"
Unexpected path failure is seeing during the I/O.
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
path_selector "round-robin 0"
prio_callout "/sbin/mpath_prio_alua %n"
*** This bug has been marked as a duplicate of 244967 ***