Bug 481220

Summary: Timeout of directio checker too low
Product: Red Hat Enterprise Linux 5 Reporter: Kit Westneat <kwestneat>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED WONTFIX QA Contact: Barry Donahue <bdonahue>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: agk, bdonahue, bmarzins, bmr, christophe.varoqui, dwysocha, egoggin, heinzm, iannis, junichi.nomura, kueda, lmb, prockai, tranlan, yanwang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-09 00:53:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kit Westneat 2009-01-22 20:40:56 UTC
Description of problem:
The directio timeout is hardcoded to be 30 seconds, which can cause false failures on storage systems with longer timeouts. As it says in checker.h:

 * Overloaded storage response time can be very long.
 * SG_IO timouts after DEF_TIMEOUT milliseconds, and checkers interprets this
 * as a path failure. multipathd then proactively evicts the path from the DM
 * multipath table in this case.
 *
 * This generaly snow balls and ends up in full eviction and IO errors for end
 * users. Bad. This may also cause SCSI bus resets, causing disruption for all
 * local and external storage hardware users.
 *
 * Provision a long timeout. Longer than any real-world application would cope
 * with.

I propose redefining DIRECTIO_TIMEOUT to be DEF_TIMEOUT.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.7-17

Comment 2 Ben Marzinski 2010-07-27 19:41:12 UTC
Now that the synchronous checkers have a configurable timeout, directio could use that as well for the asynchornous timeout. On most machines, it defaults to 60 seconds, but it can be changed in multipath.conf

Comment 3 RHEL Program Management 2014-01-29 10:40:10 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 4 Ben Marzinski 2014-04-09 00:53:04 UTC
This bug hasn't had any activity since 2009.  I don't like the idea of changing the default timeout for the directio checker on the last rhel 5 release.  If someone has a good reason why we should be doing this, let me know.  The code change itself is simple.