Bug 155729 - clvmd can deadlock when starting up again
clvmd can deadlock when starting up again
Status: CLOSED NEXTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks: 144795
  Show dependency treegraph
 
Reported: 2005-04-22 12:07 EDT by Corey Marthaler
Modified: 2009-04-16 15:59 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-06-07 10:57:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2005-04-22 12:07:14 EDT
Description of problem:
Both Dean and I have seen this multiple times on our respective clusters, but
since it happens during init we have limited debugging abilities and are unable
to pinpoint the exact state which is causing this. We added a 'set -x' to the
clvmd script and get this output up to the time of the hang. We then tried to
add an strace to the vgchange but have yet to reproduce since.

Apr 20 11:52:39 morph-02 clvmd: + start
Apr 20 11:52:39 morph-02 clvmd: + for rtrn in 0
Apr 20 11:52:39 morph-02 clvmd: + pidof clvmd
Apr 20 11:52:39 morph-02 clvmd: + echo -n 'Starting clvmd:'
Apr 20 11:52:39 morph-02 clvmd: + clvmd
Apr 20 11:52:41 morph-02 clvmd: Cluster LVM daemon started - connected to CMAN
Apr 20 11:52:41 morph-02 kernel: CMAN: WARNING no listener for port 11 on node m
orph-05
Apr 20 11:52:41 morph-02 kernel: CMAN: WARNING no listener for port 11 on node m
orph-01
Apr 20 11:52:41 morph-02 clvmd: + rtrn=0
Apr 20 11:52:41 morph-02 clvmd: + '[' 0 -eq 0 ']'
Apr 20 11:52:41 morph-02 clvmd: + success
Apr 20 11:52:41 morph-02 clvmd: + '[' -z '' ']'
Apr 20 11:52:41 morph-02 clvmd: + initlog -q -n /etc/rc3.d/S24clvmd -s '' -e 1
Apr 20 11:52:41 morph-02 clvmd:  succeeded
Apr 20 11:52:41 morph-02 clvmd: + '[' serial '!=' verbose -a -z '' ']'
Apr 20 11:52:41 morph-02 clvmd: + echo_success
Apr 20 11:52:41 morph-02 clvmd: + '[' serial = color ']'
Apr 20 11:52:41 morph-02 clvmd: + echo -n '[  '
Apr 20 11:52:41 morph-02 clvmd: + '[' serial = color ']'
Apr 20 11:52:41 morph-02 clvmd: + echo -n OK
Apr 20 11:52:41 morph-02 clvmd: + '[' serial = color ']'
Apr 20 11:52:41 morph-02 clvmd: + echo -n '  ]'
Apr 20 11:52:41 morph-02 clvmd: + echo -ne '\r'
Apr 20 11:52:41 morph-02 clvmd: + return 0
Apr 20 11:52:41 morph-02 clvmd: + return 0
Apr 20 11:52:41 morph-02 clvmd:
Apr 20 11:52:41 morph-02 clvmd: + echo
Apr 20 11:52:41 morph-02 clvmd: + /usr/sbin/vgscan
Apr 20 11:52:41 morph-02 clvmd: + '[' -n '' ']'
Apr 20 11:52:41 morph-02 clvmd: Activating lvms:
Apr 20 11:52:41 morph-02 clvmd: + echo -n 'Activating lvms:'
Apr 20 11:52:41 morph-02 clvmd: + /usr/sbin/vgchange -ayl
[HANG]

This can cause quite a problem since hanging in the init process causes the
machine to not start up. The only way we've found to get out of this once we hit
it is to delete all the volumes on the other nodes in the cluster and attempt to
boot once again. We have seen this when rebooting all nodes at once or by
keeping one node up and rebooting all others. Although neither way reproduces
this reliably. :(

Version-Release number of selected component (if applicable):
Cluster LVM daemon version: 2.01.09 (2005-04-04)

How reproducible:
every so often
Comment 1 Alasdair Kergon 2005-04-22 12:12:15 EDT
More config info:
  What is your lv configuration?  (eg output of lvs, vgs, pvs, fstab etc.)
  Are you using the intended RHEL4 U1 mkinitrd package?
Comment 2 Corey Marthaler 2005-04-22 14:43:06 EDT
I'm not sure the lv config matters as we've seen it with a multitude of them but
the current one I have is:

[root@tank-02 ~]# rpm -qa | grep mkinitrd
mkinitrd-4.2.1.2-1

[root@tank-02 ~]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/gfs/gfs0
  VG Name                gfs
  LV UUID                Ftjr1N-qmsn-fvoo-3ZDY-K70C-3cX0-CYVwqC
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                460.14 GB
  Current LE             117796
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:2

  --- Logical volume ---
  LV Name                /dev/gfs/gfs1
  VG Name                gfs
  LV UUID                zXITiV-n9hm-Iy2u-K97Q-zAE9-OZKU-TOlLMH
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                460.14 GB
  Current LE             117796
  Segments               2
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:3

  --- Logical volume ---
  LV Name                /dev/gfs/gfs2
  VG Name                gfs
  LV UUID                9XlzeF-g3HY-b6Y7-k7xD-QHqF-sz7a-IHGuWG
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                460.15 GB
  Current LE             117798
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:4

[root@tank-02 ~]# vgdisplay
  --- Volume group ---
  VG Name               gfs
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  Clustered             yes
  Shared                no
  MAX LV                0
  Cur LV                3
  Open LV               3
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               1.35 TB
  PE Size               4.00 MB
  Total PE              353390
  Alloc PE / Size       353390 / 1.35 TB
  Free  PE / Size       0 / 0
  VG UUID               nVr361-5iqu-lDqq-RSXF-XHfe-jfql-1Ne7lV

[root@tank-02 ~]# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda1
  VG Name               gfs
  PV Size               690.21 GB / not usable 0
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              176695
  Free PE               0
  Allocated PE          176695
  PV UUID               mcecjV-f1Hi-31a9-5hN1-CZpI-5Ks8-ipjvEr

  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               gfs
  PV Size               690.21 GB / not usable 0
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              176695
  Free PE               0
  Allocated PE          176695
  PV UUID               qNLOfT-gdNy-Zrto-9dbx-981f-Zh5f-c6BlUu


[root@tank-02 ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
/dev/VolGroup00/LogVol00        /       ext3    defaults        1       1
LABEL=/boot     /boot   ext3    defaults        1       2
none    /dev/pts        devpts  gid=5,mode=620  0       0
none    /dev/shm        tmpfs   defaults        0       0
none    /proc   proc    defaults        0       0
none    /sys    sysfs   defaults        0       0
/dev/VolGroup00/LogVol01        swap    swap    defaults        0       0
/dev/gfs/gfs0           /mnt/gfs0               gfs     debug   0 0
/dev/gfs/gfs1           /mnt/gfs1               gfs     debug   0 0
/dev/gfs/gfs2           /mnt/gfs2               gfs     debug   0 0
/dev/gfs/gfs3           /mnt/gfs3               gfs     debug   0 0
/dev/hdc                /media/cdrom            auto   
pamconsole,exec,noauto,managed 0 0
/dev/fd0                /media/floppy           auto   
pamconsole,exec,noauto,managed 0 0
Comment 3 Christine Caulfield 2005-04-26 10:57:07 EDT
strace of or debugging output from clvmd might also be handy if you can get them
Comment 5 Christine Caulfield 2005-05-10 04:21:44 EDT
I'm now fairly sure that this is a DLM bug.
Comment 6 David Teigland 2005-05-24 10:20:10 EDT
Here's the test script Patrick and I have been running.  "sleeprand"
sleeps for a random number of seconds between 1 and N.  The killall -9
on one of the nodes leaves the dlm lockspace running on that one node
so restarting clvmd will reuse that lockspace.

#!/bin/bash

set -x
while [ 1 ]
do
  ssh root@bench-21 killall -9 clvmd
  ./doall killall clvmd
  ~/sleeprand 40
  ./doall clvmd 
  ./doall vgchange -aly
done
Comment 7 David Teigland 2005-05-25 03:09:52 EDT
Another test I'm running on 8 nodes at once is a loop around:
  service clvmd start
  sleeprand 10
  service clvmd stop
  sleeprand 10
Comment 8 Corey Marthaler 2005-05-26 12:00:12 EDT
I finally saw this again last night after 46 iterations of running revolver with
the initscripts flag. This was to a five node cluster (tank-01 - 05) and one gfs
filesystem. On the 46th iteration, tank-02, 03, and 05 were shot and when they
came back up all three were stuck at the clvmd init start up. After being stuck
over night, I power cycled tank-05, and that allowed the other two nodes to
become free and continue the start up process. Also tank-05 (after being power
cycled) was able to complete it's start up process. 

So this time it wasn't a case of, once this is hit, one has to boot single,
clean up volumes, or boot a kern w/o the init scripts on. 
Comment 9 Corey Marthaler 2005-05-26 12:04:30 EDT
I saw this message (on the two nodes which were not shot) once the deadlock was
resolved:

May 26 08:41:57 tank-01 kernel: dlm: clvmd: nodes_reconfig failed -1
May 26 08:21:31 tank-04 kernel: dlm: clvmd: nodes_reconfig failed 1
Comment 10 Corey Marthaler 2005-05-26 15:47:12 EDT
Hit this again today, this time however, power cycling the hung nodes
individually didn't result in the others un-hanging. Not until I power cycled
everyone in the whole cluster did they all come up just fine.
Comment 11 David Teigland 2005-05-27 11:52:13 EDT
I can reproduce this on my bench machines using the test in comment 7.
We have collected a lot of information and have narrowed down the
problem (it appears) to missing dlm socket connections in the dlm
communications layer.  We're hoping the latest tcpdump logs will
explain why the connections are being lost or are not being reestablished
Comment 12 Christine Caulfield 2005-05-31 11:58:27 EDT
The connect retry count was not getting cleared...ever. So a node only had 3
chances of a failed connect and after that it would refuse to talk a particular
node.

When two nodes get into this state with each other there is a rather obvious
deadlock.

Because lowcomms connections can be initiated by either node, if this happens on
one node only then the DLM will run quite happily. That explains (I think) why
this is fairly rare but can be provoked by lots of connect/disconnects.
Comment 13 Corey Marthaler 2005-06-01 14:51:31 EDT
There is still a hang in the clvmd init scripts but it's no longer in
'vgchange', should I open a new bug or is this the same problem, but moved to a
different place?


Starting clvmd: + daemon clvmd
+ local gotbase= force=
+ local base= user= nice= bg= pid=
+ nicelevel=0
+ '[' clvmd '!=' clvmd ']'
+ '[' -z '' ']'
+ base=clvmd
+ '[' -f /var/run/clvmd.pid ']'
+ '[' -n '' -a -z '' ']'
+ ulimit -S -c 0
+ '[' -n '' ']'
+ '[' serial = verbose -a -z '' ']'
+ '[' -z '' ']'
+ initlog -q -c clvmd

[HANG]                        
Comment 14 Christine Caulfield 2005-06-02 03:57:24 EDT
Impossible to tell from that. Can you get the full debug output again please ?

/proc/cluster/services
/proc/cluster/dlm_debug
netstat -ta

from all nodes.
Comment 15 Christine Caulfield 2005-06-02 06:09:25 EDT
Is this on SMP? I've commited an SMP race fix to FC4 & CVS head that might be
relevant.
Comment 16 Christine Caulfield 2005-06-02 12:10:16 EDT
OK, I've checked in that fix.
See you again tomorrow ...
Comment 17 Adam "mantis" Manthei 2005-06-06 16:19:50 EDT
The summary of this bug is a little misleading.  The bug is not in the init
scripts but clvmd itself, unless the implication is that clvmd is misbehaving
with initlog().  Some rather basic cases ought to be able to narrow down where
the bug is (e.g. remove initlog from the init script or write a controlled test
to identify the problem).  

Please update the summary to something that more acurately reflects the nature
of the bug. 
Comment 18 Corey Marthaler 2005-06-06 17:05:02 EDT
I ran revolver all weekend on the 4 node tank cluster (tank-01, 03, 04, 05)
without allowing quorum to be lost (only shooting one node at a time) and never
saw any issues. 

This morning I restarted revolver so that quorum gets lost (three nodes shot
each time) and after 40 iterations, I saw the hang on all three nodes shot. Two
of the nodes were stuck starting clvmd and one was stuck doing a vgchange. CMAN
on the the node left up reported that everyone was apart of the cluster. I then
killed one of the hung nodes and that allowed the other two hung nodes to get
past the deadlock and continue. The killed node then also came back up without
problems.
Comment 19 Corey Marthaler 2005-06-07 10:00:47 EDT
Hit this last night on a three node gulm cluster. One slave was shot and got
stuck coming back up while doing a vgchange. Apparently you do not need to lose
quorum inorder for this to happen.
Comment 20 Christine Caulfield 2005-06-07 10:03:43 EDT
gulm! when did gulm appear in this bug report ?
Comment 21 Corey Marthaler 2005-06-07 10:57:05 EDT
closing this bug and opening new one for comments #18 and #19.

Note You need to log in before you can comment on or make changes to this bug.