Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 500450

Summary:

qdiskd I/O hang reporting

Product:

Red Hat Enterprise Linux 5

Reporter:

Lon Hohberger <lhh>

Component:

cman

Assignee:

Lon Hohberger <lhh>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.3

CC:

cfeist, cluster-maint, edamato, rlerch, samuel.kielek

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

cman-2.0.101-1.el5

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

500452 (view as bug list)

Environment:

Last Closed:

2009-09-02 11:09:39 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

496130

Attachments:

Description	Flags
Implementation	none

Description Lon Hohberger 2009-05-12 18:25:32 UTC

Description of problem:

In some situations, qdiskd can hang on I/O to shared storage.  Currently, when this happens, the only bread crumbs visible are on the other nodes, where they report (at debug log level):

  debug: Node 1 has missed an update 6/10

This is only noticeable if the administrator has configured qdiskd to use the DEBUG log level, and is a poor method to indicate errors.  The purpose of this feature request is to allow qdiskd to report I/O hangs on the node where the occur instead at the WARNING log level instead of DEBUG:

  warning: qdiskd: write (system call) has hung for 5 seconds
  warning: In 5 more seconds, we will be evicted
  warning: qdisk cycle took more than 1 second to complete (6.020000)

Presence of such a warning indicates that qdiskd is not at fault for a given failure to write, and gives administrators the ability to chase down or tune around I/O performance problems within their SAN environment.

The patch as designed implements a very simple state-checker thread since it was less invasive/destabilizing than switching qdiskd's syscalls to AIO (which is the other possible implementation).

Comment 1 Lon Hohberger 2009-05-12 18:26:20 UTC

Created attachment 343640 [details]
Implementation

Comment 2 Lon Hohberger 2009-05-13 15:28:23 UTC

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=83a61282601bff7dd26e8bcf4ebd4b1f38d6e25c

Comment 5 Lon Hohberger 2009-07-22 19:15:25 UTC

Cause / Consequence: This is a new feature.

Fix: Add I/O hang reporting to qdiskd

Result: Administrators can see I/O hang messages in logs on systems where they occur rather than on other systems in the cluster.

Comment 7 errata-xmlrpc 2009-09-02 11:09:39 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html