Red Hat Bugzilla – Bug 500450
qdiskd I/O hang reporting
Last modified: 2009-09-02 07:09:39 EDT
Description of problem:
In some situations, qdiskd can hang on I/O to shared storage. Currently, when this happens, the only bread crumbs visible are on the other nodes, where they report (at debug log level):
debug: Node 1 has missed an update 6/10
This is only noticeable if the administrator has configured qdiskd to use the DEBUG log level, and is a poor method to indicate errors. The purpose of this feature request is to allow qdiskd to report I/O hangs on the node where the occur instead at the WARNING log level instead of DEBUG:
warning: qdiskd: write (system call) has hung for 5 seconds
warning: In 5 more seconds, we will be evicted
warning: qdisk cycle took more than 1 second to complete (6.020000)
Presence of such a warning indicates that qdiskd is not at fault for a given failure to write, and gives administrators the ability to chase down or tune around I/O performance problems within their SAN environment.
The patch as designed implements a very simple state-checker thread since it was less invasive/destabilizing than switching qdiskd's syscalls to AIO (which is the other possible implementation).
Created attachment 343640 [details]
Cause / Consequence: This is a new feature.
Fix: Add I/O hang reporting to qdiskd
Result: Administrators can see I/O hang messages in logs on systems where they occur rather than on other systems in the cluster.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.