This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 500452 - qdiskd I/O hang reporting
qdiskd I/O hang reporting
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
All Linux
low Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks: 490147
  Show dependency treegraph
 
Reported: 2009-05-12 14:29 EDT by Lon Hohberger
Modified: 2011-02-16 11:21 EST (History)
5 users (show)

See Also:
Fixed In Version: cman-1.0.28-1.el4
Doc Type: Bug Fix
Doc Text:
Previously, Qdiskd on RHEL4 did not check if input/output (I/O) failed for tko interval times,relying only on cman kill to evict a node. With this update, Qdisk logs better when it becomes suspended on I/O.
Story Points: ---
Clone Of: 500450
Environment:
Last Closed: 2011-02-16 11:21:56 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Implementation (rhel4) (7.67 KB, patch)
2009-05-12 14:29 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Lon Hohberger 2009-05-12 14:29:19 EDT
+++ This bug was initially created as a clone of Bug #500450 +++

Description of problem:

In some situations, qdiskd can hang on I/O to shared storage.  Currently, when this happens, the only bread crumbs visible are on the other nodes, where they report (at debug log level):

  debug: Node 1 has missed an update 6/10

This is only noticeable if the administrator has configured qdiskd to use the DEBUG log level, and is a poor method to indicate errors.  The purpose of this feature request is to allow qdiskd to report I/O hangs on the node where the occur instead at the WARNING log level instead of DEBUG:

  warning: qdiskd: write (system call) has hung for 5 seconds
  warning: In 5 more seconds, we will be evicted
  warning: qdisk cycle took more than 1 second to complete (6.020000)

Presence of such a warning indicates that qdiskd is not at fault for a given failure to write, and gives administrators the ability to chase down or tune around I/O performance problems within their SAN environment.

The patch as designed implements a very simple state-checker thread since it was less invasive/destabilizing than switching qdiskd's syscalls to AIO (which is the other possible implementation).

--- Additional comment from lhh@redhat.com on 2009-05-12 14:26:20 EDT ---

Created an attachment (id=343640)
Implementation
Comment 1 Lon Hohberger 2009-05-12 14:29:52 EDT
Created attachment 343641 [details]
Implementation (rhel4)
Comment 5 Florian Nadge 2011-01-03 09:30:22 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, Qdiskd on RHEL4 did not check if input/output (I/O) failed for tko interval times,relying only on cman kill to evict a node. With this update, Qdisk logs better when it becomes suspended on Input/Output.
Comment 6 Florian Nadge 2011-01-03 09:30:42 EST
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, Qdiskd on RHEL4 did not check if input/output (I/O) failed for tko interval times,relying only on cman kill to evict a node. With this update, Qdisk logs better when it becomes suspended on Input/Output.+Previously, Qdiskd on RHEL4 did not check if input/output (I/O) failed for tko interval times,relying only on cman kill to evict a node. With this update, Qdisk logs better when it becomes suspended on I/O.
Comment 7 errata-xmlrpc 2011-02-16 11:21:56 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0271.html

Note You need to log in before you can comment on or make changes to this bug.