Description of problem: fenced gets stuck in a tight loop similar to the following strace: 9029 11:14:00.137856 recvfrom(5, 0x7fffa7f1e1b0, 20, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000007> 9029 11:14:00.137939 poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=-1}], 4, -1) = 1 ([{fd=6, revents=POLLIN}]) <0.000007> The FD returned from cman is not meant to be stored anywhere - this is documented in the header file (which is the official source of documentation for the API). Under some circumstances cman can return the FD for /dev/zero (which is always active) and if the client application stores this instead of the one it expects then it will loop forever. It's probably VERY hard to reproduce this, there would need to be some data coming from cman quite soon after startup for it to happen. Version-Release number of selected component (if applicable): Seen at customer site on RHEL5.6, but could affect any RHEL5 and RHEL6 versions How reproducible: Probably very hard. Other information: daemons other than fenced could easily be affectd, I haven't checked them yet.
Created attachment 612391 [details] Proposed patch I can't reproduce the problem, but this patch should cure the symptoms. I'm going to see how reproducible it is with the customer.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Created attachment 623347 [details] Proposed patch backported to RHEL6 Chrissie's patch "front-ported" to RHEL6.
Quality Engineering Management has reviewed and declined this request. You may appeal this decision by reopening this request.
commit 60dd70f06444939ea14bb6a40cfb61ab1eea9616 Author: Christine Caulfield <ccaulfie> Date: Mon Apr 8 16:11:51 2013 +0100 fenced: get the cman fd before each poll
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1304.html