Bug 476069

Summary: /etc/init.d/clvmd doesnt wait for finish
Product: Red Hat Enterprise Linux 4 Reporter: Edwin Eefting <edwin>
Component: lvm2-clusterAssignee: Milan Broz <mbroz>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 4.7CC: agk, ccaulfie, cmarthal, coughlan, dwysocha, edwin, heinzm, iannis, jbrassow, mbroz, prockai, pvrabec
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-cluster-2.02.42-7.el4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-16 16:36:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Edwin Eefting 2008-12-11 19:35:04 UTC
/etc/init.d/clvmd doesnt wait for complete finish, which in turn will give a problem if the machine tries to leave the cluster.

Reproduce: Use a very fast clustered machine with LVM and GFS on a SAN, and reboot it or shut it down. cman wont be able to leave the cluster. i adjusted cman to produce extra debug output which shows this:

Dec 11 19:27:40 localhost cman: Stopping cman:
Dec 11 19:27:40 localhost cman: DEBUG, SERVICES BEFORE LEAVE:
Dec 11 19:27:40 localhost cman: Service          Name                              GID LID State     Code
Dec 11 19:27:40 localhost cman: DLM Lock Space:  "clvmd"                             3   3 run       S-15,200,2
Dec 11 19:27:40 localhost cman: [2 1]
Dec 11 19:27:40 localhost cman:
Dec 11 19:27:40 localhost cman: DEBUG LEAVE OUTPUT:
Dec 11 19:27:40 localhost cman: cman_tool: Can't leave cluster while there are 2 active subsystems
Dec 11 19:27:44 localhost rc: Stopping cman:  failed

The cman_tool services output shows clvmd is still running, while /etc/init.d/clvmd already has been stopped.

Furter investigation points out that /etc/init.d/clvmd doenst wait for a complete finish of clvmd, hence producing a race condition which result to a stop error with cman.

This simple patch to clvmd solves it:

--- clvmd.orig  2008-12-11 20:10:09.000000000 +0100
+++ clvmd       2008-12-11 19:36:43.000000000 +0100
@@ -114,6 +114,7 @@
        stop
        rtrn=$?
        [ $rtrn = 0 ] && rm -f $LOCK_FILE
+       wait_for_finish
        ;;

   restart)



Edwin Eefting (DatuX)
Fabian Schneider (NSI-BV)

Comment 1 Alasdair Kergon 2010-05-14 20:16:32 UTC
Sorry for the long delay in responding.  The problem is fixed upstream along with other problems in that script.  I'll add this script to the list for consideration for RHEL 4.9.

Comment 2 Edwin Eefting 2010-05-14 21:59:36 UTC
Its about time, thanks ;)

Comment 3 Milan Broz 2010-10-04 10:54:01 UTC
yes, this initscript problem should be fixed in next update (4.9).

Comment 4 Milan Broz 2010-10-21 17:05:22 UTC
Fixed in lvm2-cluster-2.02.42-7.el4.

Comment 6 Corey Marthaler 2011-01-14 00:23:20 UTC
Fix verified in lvm2-cluster-2.02.42-9.el4.

Comment 7 errata-xmlrpc 2011-02-16 16:36:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0274.html