Bug 500450 - qdiskd I/O hang reporting
Summary: qdiskd I/O hang reporting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 496130
TreeView+ depends on / blocked
 
Reported: 2009-05-12 18:25 UTC by Lon Hohberger
Modified: 2009-09-02 11:09 UTC (History)
5 users (show)

Fixed In Version: cman-2.0.101-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 500452 (view as bug list)
Environment:
Last Closed: 2009-09-02 11:09:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Implementation (7.81 KB, patch)
2009-05-12 18:26 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1341 0 normal SHIPPED_LIVE Low: cman security, bug fix, and enhancement update 2009-09-01 10:43:16 UTC

Description Lon Hohberger 2009-05-12 18:25:32 UTC
Description of problem:

In some situations, qdiskd can hang on I/O to shared storage.  Currently, when this happens, the only bread crumbs visible are on the other nodes, where they report (at debug log level):

  debug: Node 1 has missed an update 6/10

This is only noticeable if the administrator has configured qdiskd to use the DEBUG log level, and is a poor method to indicate errors.  The purpose of this feature request is to allow qdiskd to report I/O hangs on the node where the occur instead at the WARNING log level instead of DEBUG:

  warning: qdiskd: write (system call) has hung for 5 seconds
  warning: In 5 more seconds, we will be evicted
  warning: qdisk cycle took more than 1 second to complete (6.020000)

Presence of such a warning indicates that qdiskd is not at fault for a given failure to write, and gives administrators the ability to chase down or tune around I/O performance problems within their SAN environment.

The patch as designed implements a very simple state-checker thread since it was less invasive/destabilizing than switching qdiskd's syscalls to AIO (which is the other possible implementation).

Comment 1 Lon Hohberger 2009-05-12 18:26:20 UTC
Created attachment 343640 [details]
Implementation

Comment 5 Lon Hohberger 2009-07-22 19:15:25 UTC
Cause / Consequence: This is a new feature.

Fix: Add I/O hang reporting to qdiskd

Result: Administrators can see I/O hang messages in logs on systems where they occur rather than on other systems in the cluster.

Comment 7 errata-xmlrpc 2009-09-02 11:09:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html


Note You need to log in before you can comment on or make changes to this bug.