+++ This bug was initially created as a clone of Bug #470417 +++ rhel4 clone Description of problem: Running revolver with ckpt-fixed openais results in apparant hang of clvmd during startup. Version-Release number of selected component (if applicable): lvm2-cluster-2.02.40-6.el5 Nov 6 19:51:48 bench-02 kernel: Lock_DLM (built Oct 14 2008 15:12:40) installed How reproducible: just run revolver until it locks Steps to Reproduce: 1. setup 3 node revolver run with plock load 2. wait to 5.X+ iterations and until revolver fails with "deadlock on node X" message. 3. make sure your connected to the terminal server output of the three nodes so you can see the startup process. You will find the following: Starting cluster: Loading modules... DLM (built Oct 27 2008 22:03:27) installed GFS2 (built Oct 27 2008 22:04:01) installed done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... done [ OK ] Starting system message bus: [ OK ] Starting clvmd: dlm: Using TCP for communications dlm: connecting to 2 dlm: got connection from 2 dlm: got connection from 3 [ OK ] <DEADLOCKS HERE> notice the last step is clvmd starting after which I would expect to see: something about vg being activated. Actual results: deadlocks causing revolver to fail and the node to never come up or fenced as a result of its failure to start. Expected results: node will continue and operate normally. Additional info: [root@bench-02 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 836 2008-11-06 19:58:43 bench-01 2 M 820 2008-11-06 19:51:31 bench-02 3 M 840 2008-11-06 19:58:43 bench-03 [root@bench-02 ~]# cman_tool status Version: 6.1.0 Config Version: 1 Cluster Name: bench-123 Cluster Id: 50595 Cluster Member: Yes Cluster Generation: 840 Membership state: Cluster-Member Nodes: 3 Expected votes: 3 Total votes: 3 Quorum: 2 Active subsystems: 8 Flags: Dirty Ports Bound: 0 11 Node name: bench-02 Node ID: 2 Multicast addresses: 239.192.197.105 Node addresses: 10.15.84.22 [root@bench-02 ~]# group_tool info type level name id state fence 0 default 00010001 none [1 2 3] dlm 1 clvmd 00020001 none [1 2 3] dlm 1 bench-1230 00040001 none [2] dlm 1 bench-1231 00060001 none [2] dlm 1 bench-1232 00080001 none [2] gfs 2 bench-1230 00030001 none [2] gfs 2 bench-1231 00050001 none [2] gfs 2 bench-1232 00070001 none [2] (above from node 2 in the cluster) --- Additional comment from ccaulfie on 2008-11-21 08:58:33 EDT --- I've checked in the fix I have. It doesn't seem to fully fix the problem but it does make it MUCH hard to reproduce! Checking in WHATS_NEW; /cvs/lvm2/LVM2/WHATS_NEW,v <-- WHATS_NEW new revision: 1.999; previous revision: 1.998 done Checking in daemons/clvmd/clvmd.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v <-- clvmd.c new revision: 1.52; previous revision: 1.51 done
In CVS - lvm2-cluster-2.02.42-1.el4
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1047.html