Bug 500450 - qdiskd I/O hang reporting
qdiskd I/O hang reporting
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
low Severity medium
: rc
: ---
Assigned To: Lon Hohberger
Cluster QE
Depends On:
Blocks: 496130
  Show dependency treegraph
Reported: 2009-05-12 14:25 EDT by Lon Hohberger
Modified: 2009-09-02 07:09 EDT (History)
5 users (show)

See Also:
Fixed In Version: cman-2.0.101-1.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 500452 (view as bug list)
Last Closed: 2009-09-02 07:09:39 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Implementation (7.81 KB, patch)
2009-05-12 14:26 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Lon Hohberger 2009-05-12 14:25:32 EDT
Description of problem:

In some situations, qdiskd can hang on I/O to shared storage.  Currently, when this happens, the only bread crumbs visible are on the other nodes, where they report (at debug log level):

  debug: Node 1 has missed an update 6/10

This is only noticeable if the administrator has configured qdiskd to use the DEBUG log level, and is a poor method to indicate errors.  The purpose of this feature request is to allow qdiskd to report I/O hangs on the node where the occur instead at the WARNING log level instead of DEBUG:

  warning: qdiskd: write (system call) has hung for 5 seconds
  warning: In 5 more seconds, we will be evicted
  warning: qdisk cycle took more than 1 second to complete (6.020000)

Presence of such a warning indicates that qdiskd is not at fault for a given failure to write, and gives administrators the ability to chase down or tune around I/O performance problems within their SAN environment.

The patch as designed implements a very simple state-checker thread since it was less invasive/destabilizing than switching qdiskd's syscalls to AIO (which is the other possible implementation).
Comment 1 Lon Hohberger 2009-05-12 14:26:20 EDT
Created attachment 343640 [details]
Comment 5 Lon Hohberger 2009-07-22 15:15:25 EDT
Cause / Consequence: This is a new feature.

Fix: Add I/O hang reporting to qdiskd

Result: Administrators can see I/O hang messages in logs on systems where they occur rather than on other systems in the cluster.
Comment 7 errata-xmlrpc 2009-09-02 07:09:39 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.