Description of problem: During system startup with the cman init script enabled, cman_tool reports the following message: [kanderso@dhcp83-120 tmp]$ cat cman_tool-aisexec.txt tarting cluster: Loading modules... DLM (built Dec 4 2006 15:58:12) installed GFS2 (built Dec 4 2006 15:58:52) installed done Mounting configfs... done Starting ccsd... done Starting cman... failed /usr/sbin/cman_tool: aisexec daemon didn't start Once the node is up, it appears that aisexec does in fact start correctly, cman_tool status returns current state and the node is participating in the cluster. The result of this is that the full startup script does not get executed and the remaining cluster services do not get started properly. Running service cman start successfully brings up the cluster. Version-Release number of selected component (if applicable): How reproducible: This has happened twice on the xen cluster so far when running revolver Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I upped the loop count in cman_tool/join.c to 100 from 20 to see if that makes it go away for the short term. Have restarted my tests, will let you know if I see it again.
There is actually a small bug in cman_tool join such that it doesn't spot that aisexec has started correctly or crashed. Fixing this bug means that the loop counter can go much higher as aisexec failures will be properly detected. Checking in join.c; /cvs/cluster/cluster/cman/cman_tool/join.c,v <-- join.c new revision: 1.48; previous revision: 1.47 done
Checked in to RHEL5 & RHEL50 branches Checking in join.c; /cvs/cluster/cluster/cman/cman_tool/join.c,v <-- join.c new revision: 1.47.2.1; previous revision: 1.47 done Checking in join.c; /cvs/cluster/cluster/cman/cman_tool/join.c,v <-- join.c new revision: 1.47.4.1; previous revision: 1.47 done
A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you.
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.