Bug 481220 - Timeout of directio checker too low
Timeout of directio checker too low
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Ben Marzinski
Barry Donahue
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-22 15:40 EST by Kit Westneat
Modified: 2014-04-08 20:53 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-04-08 20:53:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kit Westneat 2009-01-22 15:40:56 EST
Description of problem:
The directio timeout is hardcoded to be 30 seconds, which can cause false failures on storage systems with longer timeouts. As it says in checker.h:

 * Overloaded storage response time can be very long.
 * SG_IO timouts after DEF_TIMEOUT milliseconds, and checkers interprets this
 * as a path failure. multipathd then proactively evicts the path from the DM
 * multipath table in this case.
 *
 * This generaly snow balls and ends up in full eviction and IO errors for end
 * users. Bad. This may also cause SCSI bus resets, causing disruption for all
 * local and external storage hardware users.
 *
 * Provision a long timeout. Longer than any real-world application would cope
 * with.

I propose redefining DIRECTIO_TIMEOUT to be DEF_TIMEOUT.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.7-17
Comment 2 Ben Marzinski 2010-07-27 15:41:12 EDT
Now that the synchronous checkers have a configurable timeout, directio could use that as well for the asynchornous timeout. On most machines, it defaults to 60 seconds, but it can be changed in multipath.conf
Comment 3 RHEL Product and Program Management 2014-01-29 05:40:10 EST
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.
Comment 4 Ben Marzinski 2014-04-08 20:53:04 EDT
This bug hasn't had any activity since 2009.  I don't like the idea of changing the default timeout for the directio checker on the last rhel 5 release.  If someone has a good reason why we should be doing this, let me know.  The code change itself is simple.

Note You need to log in before you can comment on or make changes to this bug.