/etc/init.d/clvmd doesnt wait for complete finish, which in turn will give a problem if the machine tries to leave the cluster. Reproduce: Use a very fast clustered machine with LVM and GFS on a SAN, and reboot it or shut it down. cman wont be able to leave the cluster. i adjusted cman to produce extra debug output which shows this: Dec 11 19:27:40 localhost cman: Stopping cman: Dec 11 19:27:40 localhost cman: DEBUG, SERVICES BEFORE LEAVE: Dec 11 19:27:40 localhost cman: Service Name GID LID State Code Dec 11 19:27:40 localhost cman: DLM Lock Space: "clvmd" 3 3 run S-15,200,2 Dec 11 19:27:40 localhost cman: [2 1] Dec 11 19:27:40 localhost cman: Dec 11 19:27:40 localhost cman: DEBUG LEAVE OUTPUT: Dec 11 19:27:40 localhost cman: cman_tool: Can't leave cluster while there are 2 active subsystems Dec 11 19:27:44 localhost rc: Stopping cman: failed The cman_tool services output shows clvmd is still running, while /etc/init.d/clvmd already has been stopped. Furter investigation points out that /etc/init.d/clvmd doenst wait for a complete finish of clvmd, hence producing a race condition which result to a stop error with cman. This simple patch to clvmd solves it: --- clvmd.orig 2008-12-11 20:10:09.000000000 +0100 +++ clvmd 2008-12-11 19:36:43.000000000 +0100 @@ -114,6 +114,7 @@ stop rtrn=$? [ $rtrn = 0 ] && rm -f $LOCK_FILE + wait_for_finish ;; restart) Edwin Eefting (DatuX) Fabian Schneider (NSI-BV)
Sorry for the long delay in responding. The problem is fixed upstream along with other problems in that script. I'll add this script to the list for consideration for RHEL 4.9.
Its about time, thanks ;)
yes, this initscript problem should be fixed in next update (4.9).
Fixed in lvm2-cluster-2.02.42-7.el4.
Fix verified in lvm2-cluster-2.02.42-9.el4.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0274.html