This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours
Bug 244968 - Frequent path failures during I/O on DM multipath devices
Frequent path failures during I/O on DM multipath devices
Status: CLOSED DUPLICATE of bug 244967
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
4.4
All Linux
low Severity urgent
: ---
: ---
Assigned To: LVM and device-mapper development team
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-20 03:15 EDT by vijay
Modified: 2010-01-11 21:29 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-26 16:35:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description vijay 2007-06-20 03:15:13 EDT
Description of problem:

While I/O on dm multipath devices , we are seeing frequent path failures which 
leads to unexpected I/O failover.

Snippet of syslog during failure :
****************************************
scsi(2:1:16) UNDERRUN status detected 0x15-0x0. 
resid=0x7fff8fff fw_resid=0x7fff8fff cdb=0x28 
os_underflow=0xf400 srb_flags=0x2
scsi(2:0:1:16) Dropped frame(s) detected 
(7fff8fff of f400 bytes)...retrying command.
scsi(2:1:16) qla2x00_done: did_error = 2,
comp-scsi= 0x15-0x0 pid=102056310.
SCSI error : <2 0 1 16> return code = 0x20000
end_request: I/O error, dev sdbm, sector 4192702
end_request: I/O error, dev sdbm, sector 4192708
device-mapper: dm-multipath: Failing path 68:0.


As per our understanding, We are seeing paths marked as failed for which it 
returns the status as DID_BUS_BUSY. What we understand here is, since IO's on 
multipath devices have BIO_RW_FAILFAST set (hence REQ_FASTFAIL ),   retries are 
not allowed at SCSI mid layer for errors such as QUEUEFULL, UNDERRUN..(as 
captured in the above syslog snippet) and so on. Is there any way to override 
this BIO_RW_FAILFAST  for retries to happen  in order to avoid unexpected path 
failure. 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Present atleast 50 device(Logical Units) with 8 paths to the Host
2. Start I/O on those 50 deivices. 
3. Syslog captures  "SCSI error" and "dm-multipath: Failing path"
  
Actual results:
Unexpected path failure is seeing during the I/O.

Expected results:

Additional info:

1.multipath.conf setting:
        device {
        vendor                  "HP"
        product                 "HSV210"
        path_grouping_policy    group_by_prio
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        path_checker            tur
        path_selector           "round-robin 0"
        prio_callout           "/sbin/mpath_prio_alua %n"
        rr_weight               uniform
        failback                immediate
        hardware_handler        "0"
        no_path_retry            60
}
Comment 1 Ben Marzinski 2007-10-26 16:35:31 EDT

*** This bug has been marked as a duplicate of 244967 ***

Note You need to log in before you can comment on or make changes to this bug.