Bug 856214 - fenced loops reading from /dev/zero
Summary: fenced loops reading from /dev/zero
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 857952 866622 928849 951049 953794
TreeView+ depends on / blocked
 
Reported: 2012-09-11 13:10 UTC by Christine Caulfield
Modified: 2018-11-30 21:29 UTC (History)
11 users (show)

Fixed In Version: cman-2.0.115-111.el5
Doc Type: Bug Fix
Doc Text:
Under some circumstances cman can return the fence daemon for the /dev/zero file, which is always active, and if the client application stores this instead of the one it expects then it will loop forever. This update addresses the problem by making sure that the fence daemon refreshes the file descriptor on each operation. (
Clone Of:
: 857952 953794 (view as bug list)
Environment:
Last Closed: 2013-09-30 22:06:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed patch (1.54 KB, patch)
2012-09-13 09:39 UTC, Christine Caulfield
no flags Details | Diff
Proposed patch backported to RHEL6 (1.52 KB, patch)
2012-10-08 09:06 UTC, Pavel Moravec
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 207403 0 None None None Never
Red Hat Product Errata RHBA-2013:1304 0 normal SHIPPED_LIVE cman bug fix update 2013-09-30 21:13:37 UTC

Description Christine Caulfield 2012-09-11 13:10:27 UTC
Description of problem:
fenced gets stuck in a tight loop similar to the following strace:

9029  11:14:00.137856 recvfrom(5, 0x7fffa7f1e1b0, 20, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000007>
9029  11:14:00.137939 poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=-1}], 4, -1) = 1 ([{fd=6, revents=POLLIN}]) <0.000007>

The FD returned from cman is not meant to be stored anywhere - this is documented in the header file (which is the official source of documentation for the API). Under some circumstances cman can return the FD for /dev/zero (which is always active) and if the client application stores this instead of the one it expects then it will loop forever.

It's probably VERY hard to reproduce this, there would need to be some data coming from cman quite soon after startup for it to happen. 

Version-Release number of selected component (if applicable):
Seen at customer site on RHEL5.6, but could affect any RHEL5 and RHEL6 versions

How reproducible:
Probably very hard.

Other information:
daemons other than fenced could easily be affectd, I haven't checked them yet.

Comment 1 Christine Caulfield 2012-09-13 09:39:08 UTC
Created attachment 612391 [details]
Proposed patch

I can't reproduce the problem, but this patch should cure the symptoms. I'm going to see how reproducible it is with the customer.

Comment 3 RHEL Program Management 2012-09-17 15:18:33 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 9 Pavel Moravec 2012-10-08 09:06:05 UTC
Created attachment 623347 [details]
Proposed patch backported to RHEL6

Chrissie's patch "front-ported" to RHEL6.

Comment 11 RHEL Program Management 2012-10-09 13:35:56 UTC
Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 25 Christine Caulfield 2013-04-08 15:14:31 UTC
commit 60dd70f06444939ea14bb6a40cfb61ab1eea9616
Author: Christine Caulfield <ccaulfie>
Date:   Mon Apr 8 16:11:51 2013 +0100

    fenced: get the cman fd before each poll

Comment 30 errata-xmlrpc 2013-09-30 22:06:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1304.html


Note You need to log in before you can comment on or make changes to this bug.