Red Hat Bugzilla – Bug 210641
Race condition hang/failure between cman daemons and groupd
Last modified: 2009-04-16 18:45:23 EDT
Description of problem:
The cman group daemons (fenced, gfs_controld, dlm_controld, etc)
have a race condition at startup that causes them to fail.
Basically, any of the daemons such as fenced can try to join
a group before the groupd daemon is ready to accept client connections.
We therefore need a new timeout parameter to the group_init function
used by these daemons.
Version-Release number of selected component (if applicable):
RHEL5 beta 1 plus latest cluster code from 13 Oct 2006
Intermittent. Maybe once out of four attempts to start a
Steps to Reproduce:
Reboot all nodes in a cluster simultaneously.
Some of the nodes will hang starting daemons, and the cman init
script will hang on "Starting Fencing..." or one of the other
cman daemons. Also, the daemons may just exit leaving you with
group membership errors.
No hangs. All daemons come up normally. Cluster comes up normally.
Seem on the smoke cluster doing reboot tests.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release. Product Management has requested further review
of this request by Red Hat Engineering. This request is not yet committed for
inclusion in release.
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.