Description of problem: This is some what related to other init script bugs (168698 and 181817). If clvmd can talk to cman but there's no quorate cluster, then the clvmd init script will hang indefinately when starting OR stopping. [root@link-08 ~]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 3 X link-01 2 1 3 M link-08 [root@link-08 ~]# service clvmd start [HANG] Version-Release number of selected component (if applicable): [root@link-08 ~]# rpm -q lvm2-cluster lvm2-cluster-2.02.01-1.2.RHEL4
cfeist: is there any chance you could do some magic in the init scripts for this? Actually fixing it in clvmd is going to require rather more code than would be healthy in an update release. The need to support cman & gulm makes it rather awkward. Added to which, it's actually impossible to shut down clvmd on an inquorate cluster (and why would you want to?) because it needs to talk to the lock manager.
More thoughts on this: Failing the script because of an inquorate cluster is really the wrong thing to do. If you start clvmd on an inquorate cluster then it stays started. Returning a failure is incorrect because the daemon /is/ started. (surely failure of the start script implies the daemon has not started). Similarly with shutdown. Initiating shutdown of clvmd has succeeded even if the daemon has not actually stopped yet, because it /will/ stop when the cluster regains quorum. IMHO it would be more appropriate to return success for both these situations because they have suceeeded in doing what was requested of them.
As long as it doesn't hang forever, I'm on board. :)
Oh good grief, the init script also enables & disables volumes! There's /no/ way that's going to work on an inquorate cluster. This is going to need some serious init-script mucking about if it's even possible to do anything sensible at all. What should happen ? Do we not activate/deactivate the volumes? If not then when do they get activated/deactivated ?? IMHO the best thing to do with this is just to background the whole damn script - if that's possible with an init script.
I had a brainwave. clvmd now has an optional startup timeout (switch -T, see the man page for details). In the script I have set this to 20 seconds. if clvmd doesn't get started within this time then the script will exit without attempting to activate any logical volumes. It's important to note that if this does happen clvmd HAS been started and will start operations as soon as the cluster is quorate...but this does NOT include activating the logical volumes that got missed by this script. Checking in daemons/clvmd/clvmd.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v <-- clvmd.c new revision: 1.30; previous revision: 1.29 done Checking in scripts/clvmd_init_rhel4; /cvs/lvm2/LVM2/scripts/clvmd_init_rhel4,v <-- clvmd_init_rhel4 new revision: 1.11; previous revision: 1.10 done
fix verified in lvm2-cluster-2.02.16-1 [root@link-08 bin]# service ccsd start start Starting ccsd: [ OK ] [root@link-08 bin]# service cman start Starting cman: [FAILED] [root@link-08 bin]# cman_tool nodes Node Votes Exp Sts Name 1 1 4 M link-08 [root@link-08 bin]# service clvmd start Starting clvmd: clvmd startup timed out [FAILED]
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0046.html