Bug 210641 - Race condition hang/failure between cman daemons and groupd
Race condition hang/failure between cman daemons and groupd
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Robert Peterson
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2006-10-13 10:35 EDT by Robert Peterson
Modified: 2009-04-16 18:45 EDT (History)
1 user (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-11-28 16:12:39 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Robert Peterson 2006-10-13 10:35:34 EDT
Description of problem:
The cman group daemons (fenced, gfs_controld, dlm_controld, etc)
have a race condition at startup that causes them to fail.
Basically, any of the daemons such as fenced can try to join
a group before the groupd daemon is ready to accept client connections.
We therefore need a new timeout parameter to the group_init function
used by these daemons.

Version-Release number of selected component (if applicable):
RHEL5 beta 1 plus latest cluster code from 13 Oct 2006

How reproducible:
Intermittent.  Maybe once out of four attempts to start a
cluster simultaneously.

Steps to Reproduce:
Reboot all nodes in a cluster simultaneously.
Actual results:
Some of the nodes will hang starting daemons, and the cman init
script will hang on "Starting Fencing..." or one of the other 
cman daemons.  Also, the daemons may just exit leaving you with
group membership errors.

Expected results:
No hangs.  All daemons come up normally.  Cluster comes up normally.

Additional info:
Seem on the smoke cluster doing reboot tests.
Comment 1 RHEL Product and Program Management 2006-10-13 10:46:36 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.
Comment 4 Nate Straz 2007-12-13 12:22:32 EST
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.

Note You need to log in before you can comment on or make changes to this bug.