Bug 187812 - clvmd init script should time out and fail if there's no quorate cluster
Summary: clvmd init script should time out and fail if there's no quorate cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: lvm2-cluster
Version: 4
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-04-03 18:39 UTC by Corey Marthaler
Modified: 2010-01-12 04:04 UTC (History)
4 users (show)

Fixed In Version: RHBA-2007-0046
Clone Of:
Environment:
Last Closed: 2007-05-10 21:06:44 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0046 0 normal SHIPPED_LIVE lvm2-cluster bug fix update 2007-05-10 21:06:38 UTC

Description Corey Marthaler 2006-04-03 18:39:58 UTC
Description of problem:
This is some what related to other init script bugs (168698 and 181817). 

If clvmd can talk to cman but there's no quorate cluster, then the clvmd init
script will hang indefinately when starting OR stopping.


[root@link-08 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    3   X   link-01
   2    1    3   M   link-08
[root@link-08 ~]# service clvmd start
[HANG]

Version-Release number of selected component (if applicable):
[root@link-08 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.01-1.2.RHEL4

Comment 1 Christine Caulfield 2006-04-04 13:23:09 UTC
cfeist: is there any chance you could do some magic in the init scripts for this?

Actually fixing it in clvmd is going to require rather more code than would be
healthy in an update release. The need to support cman & gulm makes it rather
awkward.

Added to which, it's actually impossible to shut down clvmd on an inquorate
cluster (and why would you want to?) because it needs to talk to the lock manager.

Comment 2 Christine Caulfield 2006-04-18 14:43:14 UTC
More thoughts on this:

Failing the script because of an inquorate cluster is really the wrong thing to
do. If you start clvmd on an inquorate cluster then it stays started. Returning
a failure is incorrect because the daemon /is/ started. (surely failure of the
start script implies the daemon has not started).

Similarly with shutdown. Initiating shutdown of clvmd has succeeded even if the
daemon has not actually stopped yet, because it /will/ stop when the cluster
regains quorum.

IMHO it would be more appropriate to return success for both these situations
because they have suceeeded in doing what was requested of them.


Comment 3 Corey Marthaler 2006-04-18 14:55:22 UTC
As long as it doesn't hang forever, I'm on board. :)

Comment 4 Christine Caulfield 2006-04-19 10:11:43 UTC
Oh good grief, the init script also enables & disables volumes! There's /no/ way
that's going to work on an inquorate cluster.

This is going to need some serious init-script mucking about if it's even
possible to do anything sensible at all. What should happen ? Do we not
activate/deactivate the volumes? If not then when do they get
activated/deactivated ??

IMHO the best thing to do with this is just to background the whole damn script
- if that's possible with an init script.

Comment 6 Christine Caulfield 2006-11-30 10:20:39 UTC
I had a brainwave.

clvmd now has an optional startup timeout (switch -T, see the man page for
details). In the script I have set this to 20 seconds. if clvmd doesn't get
started within this time then the script will exit without attempting to
activate any logical volumes.

It's important to note that if this does happen clvmd HAS been started and will
start operations as soon as the cluster is quorate...but this does NOT include
activating the logical volumes that got missed by this script. 

Checking in daemons/clvmd/clvmd.c;
/cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v  <--  clvmd.c
new revision: 1.30; previous revision: 1.29
done
Checking in scripts/clvmd_init_rhel4;
/cvs/lvm2/LVM2/scripts/clvmd_init_rhel4,v  <--  clvmd_init_rhel4
new revision: 1.11; previous revision: 1.10
done

Comment 7 Corey Marthaler 2006-12-11 22:03:37 UTC
fix verified in lvm2-cluster-2.02.16-1


[root@link-08 bin]# service ccsd start start
Starting ccsd:                                             [  OK  ]
[root@link-08 bin]# service cman start
Starting cman:                                             [FAILED]
[root@link-08 bin]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    4   M   link-08
[root@link-08 bin]# service clvmd start
Starting clvmd: clvmd startup timed out
                                                           [FAILED]


Comment 9 Red Hat Bugzilla 2007-05-10 21:06:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0046.html



Note You need to log in before you can comment on or make changes to this bug.