481220 – Timeout of directio checker too low

Bug 481220 - Timeout of directio checker too low

Summary: Timeout of directio checker too low

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Barry Donahue
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-01-22 20:40 UTC by Kit Westneat
Modified:	2014-04-09 00:53 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-04-09 00:53:04 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kit Westneat 2009-01-22 20:40:56 UTC

Description of problem:
The directio timeout is hardcoded to be 30 seconds, which can cause false failures on storage systems with longer timeouts. As it says in checker.h:

 * Overloaded storage response time can be very long.
 * SG_IO timouts after DEF_TIMEOUT milliseconds, and checkers interprets this
 * as a path failure. multipathd then proactively evicts the path from the DM
 * multipath table in this case.
 *
 * This generaly snow balls and ends up in full eviction and IO errors for end
 * users. Bad. This may also cause SCSI bus resets, causing disruption for all
 * local and external storage hardware users.
 *
 * Provision a long timeout. Longer than any real-world application would cope
 * with.

I propose redefining DIRECTIO_TIMEOUT to be DEF_TIMEOUT.

Version-Release number of selected component (if applicable):
device-mapper-multipath-0.4.7-17

Comment 2 Ben Marzinski 2010-07-27 19:41:12 UTC

Now that the synchronous checkers have a configurable timeout, directio could use that as well for the asynchornous timeout. On most machines, it defaults to 60 seconds, but it can be changed in multipath.conf

Comment 3 RHEL Program Management 2014-01-29 10:40:10 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 4 Ben Marzinski 2014-04-09 00:53:04 UTC

This bug hasn't had any activity since 2009.  I don't like the idea of changing the default timeout for the directio checker on the last rhel 5 release.  If someone has a good reason why we should be doing this, let me know.  The code change itself is simple.

Note You need to log in before you can comment on or make changes to this bug.