Bug 145683 - stuck gfs/lock_dlm recovery
Summary: stuck gfs/lock_dlm recovery
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 144795
TreeView+ depends on / blocked
 
Reported: 2005-01-20 16:54 UTC by Corey Marthaler
Modified: 2009-04-16 20:30 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-20 14:55:39 UTC
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2005-01-20 16:54:21 UTC
Description of problem:
from Dave:

The problem here appears to be stuck gfs/lock_dlm recovery on
morph-02 which is in recover state 2 (morph-03 is in recover 4
which is complete.)  To classify this further we'd need output
from /proc/cluster/lock_dlm/debug (esp on morph-02) and possibly
info on dlm/lock_dlm kernel threads.

Version-Release number of selected component (if applicable):
DLM <CVS> (built Jan 18 2005 13:36:03) installed

How reproducible:
Sometimes

Comment 1 Corey Marthaler 2005-01-20 16:57:08 UTC
I hit this while running revolver on the morph cluster. Three
of the five nodes were taken down (so quorum was lost) and then
brought back up. This caused recovery to get stuck. As a side effect
the mounting of the filesystems was hung.


morph-02 and morph-03 were the nodes left up:

[root@morph-02 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 5 1 4 3]

DLM Lock Space:  "clvmd"                             3   3 run       -
[2 5 1 4 3]

DLM Lock Space:  "gfs0"                              4   4 update   
U-4,1,1
[2 5 1]

DLM Lock Space:  "gfs1"                              6   6 run       -
[2 5]

GFS Mount Group: "gfs0"                              5   5 recover 2 -
[2 5]

GFS Mount Group: "gfs1"                              7   7 run       -
[2 5]


[root@morph-03 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 5 1 4 3]

DLM Lock Space:  "clvmd"                             3   3 run       -
[2 5 1 4 3]

DLM Lock Space:  "gfs0"                              4   4 update   
U-4,1,1
[2 5 1]

DLM Lock Space:  "gfs1"                              6   6 run       -
[2 5]

GFS Mount Group: "gfs0"                              5   5 recover 4 -
[2 5]

GFS Mount Group: "gfs1"                              7   7 run       -
[2 5]



morph-01, morph-04, and morph-05 were the nodes shot:

[root@morph-01 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[5 2 1 4 3]

DLM Lock Space:  "clvmd"                             3   3 run       -
[5 2 1 4 3]

DLM Lock Space:  "gfs0"                              4   4 join     
S-6,20,3
[5 2 1]


[root@morph-04 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 5 1 4 3]

DLM Lock Space:  "clvmd"                             3   3 run       -
[2 5 1 4 3]

DLM Lock Space:  "gfs1"                              6   4 join     
S-6,20,3
[2 5 4]


[root@morph-05 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[5 1 4 2 3]

DLM Lock Space:  "clvmd"                             3   3 run       -
[5 1 4 2 3]

Comment 2 Corey Marthaler 2005-01-20 16:57:26 UTC
morph-02:

CMAN: quorum lost, blocking activity
CMAN: quorum regained, resuming activity
GFS: fsid=morph-cluster:gfs1.2: jid=4: Trying to acquire journal lock...
GFS: fsid=morph-cluster:gfs0.2: jid=4: Trying to acquire journal lock...
GFS: fsid=morph-cluster:gfs0.2: jid=4: Looking at journal...
GFS: fsid=morph-cluster:gfs0.2: jid=4: Acquiring the transaction lock...
GFS: fsid=morph-cluster:gfs1.2: jid=4: Looking at journal...
GFS: fsid=morph-cluster:gfs1.2: jid=4: Acquiring the transaction lock...
GFS: fsid=morph-cluster:gfs0.2: jid=4: Replaying journal...
GFS: fsid=morph-cluster:gfs1.2: jid=4: Replaying journal...
GFS: fsid=morph-cluster:gfs1.2: jid=4: Replayed 182 of 182 blocks
GFS: fsid=morph-cluster:gfs1.2: jid=4: replays = 182, skips = 0, sames = 0
GFS: fsid=morph-cluster:gfs1.2: jid=4: Journal replayed in 3s
GFS: fsid=morph-cluster:gfs1.2: jid=4: Done
GFS: fsid=morph-cluster:gfs1.2: jid=3: Trying to acquire journal lock...
GFS: fsid=morph-cluster:gfs1.2: jid=3: Looking at journal...
GFS: fsid=morph-cluster:gfs1.2: jid=3: Done
GFS: fsid=morph-cluster:gfs1.2: jid=1: Trying to acquire journal lock...
GFS: fsid=morph-cluster:gfs1.2: jid=1: Busy
GFS: fsid=morph-cluster:gfs0.2: jid=4: Replayed 2727 of 2870 blocks
GFS: fsid=morph-cluster:gfs0.2: jid=4: replays = 2727, skips = 65,
sames = 78
GFS: fsid=morph-cluster:gfs0.2: jid=4: Journal replayed in 17s
GFS: fsid=morph-cluster:gfs0.2: jid=4: Done
GFS: fsid=morph-cluster:gfs0.2: jid=3: Trying to acquire journal lock...
GFS: fsid=morph-cluster:gfs0.2: jid=3: Busy
GFS: fsid=morph-cluster:gfs0.2: jid=1: Trying to acquire journal lock...
GFS: fsid=morph-cluster:gfs0.2: jid=1: Looking at journal...
GFS: fsid=morph-cluster:gfs0.2: jid=1: Acquiring the transaction lock...
lock_dlm: cancel 1,2 flags 400
lock_dlm: cancel 1,2 complete
GFS: fsid=morph-cluster:gfs0.2: jid=1: Replaying journal...
GFS: fsid=morph-cluster:gfs0.2: jid=1: Replayed 1024 of 1025 blocks
GFS: fsid=morph-cluster:gfs0.2: jid=1: replays = 1024, skips = 0,
sames = 1



Comment 3 Corey Marthaler 2005-01-20 16:58:05 UTC
I'll try to reproduce this and gather more info.

Comment 4 Kiersten (Kerri) Anderson 2005-02-23 17:41:34 UTC
Removing from Blocker list, if it is reproducable, then it will get back on the
list.

Comment 5 Corey Marthaler 2005-02-23 20:27:31 UTC
What do you know, I reproduced it. Back on the blocker list you go.

5 node cluster (morph-01 - morph-05) all running I/O to 5 GFS. I shoot
morph-05, recovery ends upstuck.

All have the same view of the nodes in the cluster:
[root@morph-01 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    5   M   morph-01.lab.msp.redhat.com
   2    1    5   M   morph-05.lab.msp.redhat.com
   3    1    5   M   morph-04.lab.msp.redhat.com
   4    1    5   M   morph-03.lab.msp.redhat.com
   5    1    5   M   morph-02.lab.msp.redhat.com


Services:

[root@morph-01 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 5 4 3 2]

DLM Lock Space:  "clvmd"                             3   4 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs0"                              4   5 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs1"                              6   7 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs2"                              8   9 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs3"                             10  11 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs4"                             12  13 recover 2 -
[1 5 4 3]

GFS Mount Group: "gfs0"                              5   6 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs1"                              7   8 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs2"                              9  10 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs3"                             11  12 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs4"                             13  14 recover 0 -
[1 5 4 3]


[root@morph-02 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 5 4 3 2]

DLM Lock Space:  "clvmd"                             3   4 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs0"                              4   5 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs1"                              6   7 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs2"                              8   9 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs3"                             10  11 recover 2 -
[1 5 4 3]

DLM Lock Space:  "gfs4"                             12  13 recover 2 -
[1 5 4 3]

GFS Mount Group: "gfs0"                              5   6 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs1"                              7   8 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs2"                              9  10 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs3"                             11  12 recover 0 -
[1 5 4 3]

GFS Mount Group: "gfs4"                             13  14 recover 0 -
[1 5 4 3]


[root@morph-03 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 4 5 3 2]

DLM Lock Space:  "clvmd"                             3   4 recover 2 -
[1 4 5 3]

DLM Lock Space:  "gfs0"                              4   5 recover 2 -
[1 4 5 3]

DLM Lock Space:  "gfs1"                              6   7 recover 2 -
[1 4 5 3]

DLM Lock Space:  "gfs2"                              8   9 recover 2 -
[1 4 5 3]

DLM Lock Space:  "gfs3"                             10  11 recover 2 -
[1 4 5 3]

DLM Lock Space:  "gfs4"                             12  13 recover 2 -
[1 4 5 3]

GFS Mount Group: "gfs0"                              5   6 recover 0 -
[1 4 5 3]

GFS Mount Group: "gfs1"                              7   8 recover 0 -
[1 4 5 3]

GFS Mount Group: "gfs2"                              9  10 recover 0 -
[1 4 5 3]

GFS Mount Group: "gfs3"                             11  12 recover 0 -
[1 4 5 3]

GFS Mount Group: "gfs4"                             13  14 recover 0 -
[1 4 5 3]


[root@morph-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 3 4 5 2]

DLM Lock Space:  "clvmd"                             3   4 recover 2 -
[1 3 4 5]

DLM Lock Space:  "gfs0"                              4   5 recover 2 -
[1 3 4 5]

DLM Lock Space:  "gfs1"                              6   7 recover 2 -
[1 3 4 5]

DLM Lock Space:  "gfs2"                              8   9 recover 2 -
[1 3 4 5]

DLM Lock Space:  "gfs3"                             10  11 recover 2 -
[1 3 4 5]

DLM Lock Space:  "gfs4"                             12  13 recover 2 -
[1 3 4 5]

GFS Mount Group: "gfs0"                              5   6 recover 0 -
[1 3 4 5]

GFS Mount Group: "gfs1"                              7   8 recover 0 -
[1 3 4 5]

GFS Mount Group: "gfs2"                              9  10 recover 0 -
[1 3 4 5]

GFS Mount Group: "gfs3"                             11  12 recover 0 -
[1 3 4 5]

GFS Mount Group: "gfs4"                             13  14 recover 0 -
[1 3 4 5]


[root@morph-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[5 4 1 3 2]





Comment 6 Corey Marthaler 2005-02-23 20:30:19 UTC
Here's the debug info:

[root@morph-01 ~]# cat /proc/cluster/lock_dlm/debug
30023
8225 lk 11,30023 id d037c 0,5 4
6982 qc 11,30023 0,5 id d037c sts 0 0
8225 req 7,30023 ex 0-7fffffffffffffff lkf 2000 wait 1
8225 lk 7,30023 id 0 -1,5 2000
8225 lk 11,30023 id d037c 5,0 4
6982 qc 11,30023 5,0 id d037c sts 0 0
6982 qc 7,30023 -1,5 id 17002a sts 0 0
8225 ex plock 0
8108 en punlock 7,46
8108 lk 11,46 id 8003c 0,5 4
6982 qc 11,46 0,5 id 8003c sts 0 0
8108 remove 7,46
8108 un 7,46 1d01a2 5 0
6982 qc 7,46 5,5 id 1d01a2 sts -65538 0
8108 lk 11,46 id 8003c 5,0 4
6982 qc 11,46 5,0 id 8003c sts 0 0
8108 ex punlock 0
8108 en plock 7,46
8108 lk 11,46 id 8003c 0,5 4
6982 qc 11,46 0,5 id 8003c sts 0 0
8108 req 7,46 ex 0-7fffffffffffffff lkf 2000 wait 1
8108 lk 7,46 id 0 -1,5 2000
8108 lk 11,46 id 8003c 5,0 4
6982 qc 11,46 5,0 id 8003c sts 0 0
6982 qc 7,46 -1,5 id 2a02a8 sts 0 0
8108 ex plock 0
8173 en punlock 7,4b
8173 lk 11,4b id 300ee 0,5 4
6982 qc 11,4b 0,5 id 300ee sts 0 0
8173 remove 7,4b
8173 un 7,4b 1301af 5 0
6982 qc 7,4b 5,5 id 1301af sts -65538 0
8173 lk 11,4b id 300ee 5,0 4
6982 qc 11,4b 5,0 id 300ee sts 0 0
8173 ex punlock 0
8173 en plock 7,4d
8222 en punlock 7,20028
8222 lk 11,20028 id a0144 0,5 4
6982 qc 11,20028 0,5 id a0144 sts 0 0
8222 remove 7,20028
8222 un 7,20028 1b0010 5 0
6982 qc 7,20028 5,5 id 1b0010 sts -65538 0
8222 lk 11,20028 id a0144 5,0 4
6982 qc 11,20028 5,0 id a0144 sts 0 0
8221 lk 11,20028 id a0144 0,5 4
6982 qc 11,20028 0,5 id a0144 sts 0 0
8221 req 7,20028 ex 0-7fffffffffffffff lkf 2000 wait 1
8221 lk 7,20028 id 0 -1,5 2000
8176 en plock 7,2002d
8176 lk 11,2002d id 403c0 0,5 4
8221 lk 11,20028 id a0144 5,0 4
6982 qc 11,2002d 0,5 id 403c0 sts 0 0
8222 ex punlock 0
6982 qc 11,20028 5,0 id a0144 sts 0 0
8176 req 7,2002d ex 2c1f80-2db511 lkf 2000 wait 1
8176 lk 7,2002d id 0 -1,5 2000
8222 en plock 7,1002d
8176 lk 11,2002d id 403c0 5,0 4
6982 qc 11,2002d 5,0 id 403c0 sts 0 0
6982 qc 7,2002d -1,5 id 1003f9 sts 0 0
6982 qc 7,20028 -1,5 id 10011e sts 0 0
8221 ex plock 0
8176 ex plock 0
8102 en punlock 7,20029
8102 lk 11,20029 id b0349 0,5 4
6982 qc 11,20029 0,5 id b0349 sts 0 0
8104 en punlock 7,49
8104 lk 11,49 id 703fb 0,5 4
6982 qc 11,49 0,5 id 703fb sts 0 0
8102 remove 7,20029
8102 un 7,20029 1b026c 5 0
8104 remove 7,49
8104 un 7,49 250200 5 0
6982 qc 7,20029 5,5 id 1b026c sts -65538 0
6982 qc 7,49 5,5 id 250200 sts -65538 0
8102 lk 11,20029 id b0349 5,0 4
8104 lk 11,49 id 703fb 5,0 4
6982 qc 11,20029 5,0 id b0349 sts 0 0
6982 qc 11,49 5,0 id 703fb sts 0 0
8102 ex punlock 0
8172 en punlock 7,20029
8172 lk 11,20029 id 503dd 0,5 4
6982 qc 11,20029 0,5 id 503dd sts 0 0
8102 en plock 7,20029
8172 remove 7,20029
8172 un 7,20029 1200f5 5 0
6982 qc 7,20029 5,5 id 1200f5 sts -65538 0
8102 lk 11,20029 id b0349 0,5 4
8172 lk 11,20029 id 503dd 5,0 4
6982 qc 11,20029 5,0 id 503dd sts 0 0
8104 ex punlock 0
6982 qc 11,20029 0,5 id b0349 sts 0 0
8172 ex punlock 0
8172 en plock 7,20027
8172 lk 11,20027 id 70137 0,5 4
6982 qc 11,20027 0,5 id 70137 sts 0 0
8172 req 7,20027 ex 0-7fffffffffffffff lkf 2000 wait 1
8172 lk 7,20027 id 0 -1,5 2000
8172 lk 11,20027 id 70137 5,0 4
6982 qc 11,20027 5,0 id 70137 sts 0 0
6982 qc 7,20027 -1,5 id 1101da sts 0 0
8172 ex plock 0
8102 req 7,20029 ex 6255-6a01 lkf 2000 wait 1
8102 lk 7,20029 id 0 -1,5 2000
8102 lk 11,20029 id b0349 5,0 4
6982 qc 7,20029 -1,5 id 1e0339 sts 0 0
6982 qc 11,20029 5,0 id b0349 sts 0 0
8102 ex plock 0
8140 en punlock 7,20028
8140 lk 11,20028 id a01a0 0,5 4
6982 qc 11,20028 0,5 id a01a0 sts 0 0
8140 remove 7,20028
8140 un 7,20028 1a0255 5 0
6982 qc 7,20028 5,5 id 1a0255 sts -65538 0
8140 lk 11,20028 id a01a0 5,0 4
6982 qc 11,20028 5,0 id a01a0 sts 0 0
8140 ex punlock 0
8217 en punlock 7,1002d
8174 en punlock 7,4d
8140 en plock 7,20028
8176 en punlock 7,2002d
8105 en punlock 7,50
8190 en punlock 7,4f
8104 en plock 7,49
8225 en punlock 7,30023
8106 en punlock 7,44
8163 en punlock 7,4c
8107 en punlock 7,4f
8221 en punlock 7,20028
8171 en punlock 7,20028
8108 en punlock 7,46
8172 en punlock 7,20027
8102 en punlock 7,20029
7791 un 2,5002a b0128 5 0
7628 un 2,1002c 40360 5 0
7714 un 2,59 c0183 5 0
7560 un 2,300a3 502b6 5 0
7483 un 2,373 d01ed 5 0


[root@morph-02 ~]# cat /proc/cluster/lock_dlm/debug
b id 4035b 5,0 4
6536 qc 11,14af9bb 5,0 id 4035b sts 0 0
7372 en punlock 7,14bf9bb
7372 lk 11,14bf9bb id 40162 0,5 4
6536 qc 11,14bf9bb 0,5 id 40162 sts 0 0
7372 remove 7,14bf9bb
7372 un 7,14bf9bb d039b 5 0
6536 qc 7,14bf9bb 5,5 id d039b sts -65538 0
7372 lk 11,14bf9bb id 40162 5,0 4
6536 qc 11,14bf9bb 5,0 id 40162 sts 0 0
7372 ex punlock 0
6536 qc 7,14af9bb -1,5 id e00cb sts 0 0
7306 en punlock 7,14af9d0
7306 lk 11,14af9d0 id 30071 0,5 4
6536 qc 11,14af9d0 0,5 id 30071 sts 0 0
7306 remove 7,14af9d0
7306 un 7,14af9d0 110045 5 0
6536 qc 7,14af9d0 5,5 id 110045 sts -65538 0
7306 lk 11,14af9d0 id 30071 5,0 4
6536 qc 11,14af9d0 5,0 id 30071 sts 0 0
7306 ex punlock 0
7306 en plock 7,14af9d0
7306 lk 11,14af9d0 id 30071 0,5 4
6536 qc 11,14af9d0 0,5 id 30071 sts 0 0
7306 req 7,14af9d0 ex 0-7fffffffffffffff lkf 2000 wait 1
7306 lk 7,14af9d0 id 0 -1,5 2000
7306 lk 11,14af9d0 id 30071 5,0 4
6536 qc 11,14af9d0 5,0 id 30071 sts 0 0
7268 en plock 7,14af9c2
7268 lk 11,14af9c2 id 502c2 0,5 4
6536 qc 11,14af9c2 0,5 id 502c2 sts 0 0
7268 req 7,14af9c2 ex 0-2e4ca1 lkf 2000 wait 1
7268 lk 7,14af9c2 id 0 -1,5 2000
7268 lk 11,14af9c2 id 502c2 5,0 4
6536 qc 7,14af9c2 -1,5 id 11026e sts 0 0
6536 qc 11,14af9c2 5,0 id 502c2 sts 0 0
7268 ex plock 0
7398 en punlock 7,14cf9b0
6536 qc 7,14af9d0 -1,5 id 140347 sts 0 0
7398 lk 11,14cf9b0 id 10135 0,5 4
7306 ex plock 0
6536 qc 11,14cf9b0 0,5 id 10135 sts 0 0
7398 remove 7,14cf9b0
7398 un 7,14cf9b0 90289 5 0
6536 qc 7,14cf9b0 5,5 id 90289 sts -65538 0
7398 lk 11,14cf9b0 id 10135 5,0 4
6536 qc 11,14cf9b0 5,0 id 10135 sts 0 0
7398 ex punlock 0
7398 en plock 7,14cf9b1
7400 lk 11,14cf9b0 id 10135 0,5 4
6536 qc 11,14cf9b0 0,5 id 10135 sts 0 0
7400 req 7,14cf9b0 ex 0-7fffffffffffffff lkf 2000 wait 1
7400 lk 7,14cf9b0 id 0 -1,5 2000
7400 lk 11,14cf9b0 id 10135 5,0 4
6536 qc 11,14cf9b0 5,0 id 10135 sts 0 0
7332 en punlock 7,14af9c6
7332 lk 11,14af9c6 id 102a8 0,5 4
6536 qc 11,14af9c6 0,5 id 102a8 sts 0 0
7332 remove 7,14af9c6
7332 un 7,14af9c6 c02dd 5 0
6536 qc 7,14af9c6 5,5 id c02dd sts -65538 0
7332 lk 11,14af9c6 id 102a8 5,0 4
6536 qc 11,14af9c6 5,0 id 102a8 sts 0 0
7332 ex punlock 0
7332 en plock 7,14af9c5
7332 lk 11,14af9c5 id 30124 0,5 4
6536 qc 11,14af9c5 0,5 id 30124 sts 0 0
7332 req 7,14af9c5 ex 0-7fffffffffffffff lkf 2000 wait 1
7332 lk 7,14af9c5 id 0 -1,5 2000
7332 lk 11,14af9c5 id 30124 5,0 4
6536 qc 11,14af9c5 5,0 id 30124 sts 0 0
6536 qc 7,14af9c5 -1,5 id d0232 sts 0 0
7332 ex plock 0
7305 en punlock 7,14af9d9
7305 lk 11,14af9d9 id 50025 0,5 4
6536 qc 11,14af9d9 0,5 id 50025 sts 0 0
7305 remove 7,14af9d9
7305 un 7,14af9d9 a01e3 5 0
6536 qc 7,14af9d9 5,5 id a01e3 sts -65538 0
7305 lk 11,14af9d9 id 50025 5,0 4
6536 qc 11,14af9d9 5,0 id 50025 sts 0 0
7303 lk 11,14af9d9 id 50025 0,5 4
6536 qc 11,14af9d9 0,5 id 50025 sts 0 0
7303 req 7,14af9d9 ex 0-7fffffffffffffff lkf 2000 wait 1
7303 lk 7,14af9d9 id 0 -1,5 2000
7303 lk 11,14af9d9 id 50025 5,0 4
6536 qc 11,14af9d9 5,0 id 50025 sts 0 0
7305 ex punlock 0
7305 en plock 7,14af9cb
7305 lk 11,14af9cb id 20245 0,5 4
6536 qc 11,14af9cb 0,5 id 20245 sts 0 0
7305 req 7,14af9cb ex 0-7fffffffffffffff lkf 2000 wait 1
7305 lk 7,14af9cb id 0 -1,5 2000
7305 lk 11,14af9cb id 20245 5,0 4
6536 qc 11,14af9cb 5,0 id 20245 sts 0 0
6536 qc 7,14af9d9 -1,5 id f00ef sts 0 0
7303 ex plock 0
7372 en plock 7,14bf9bb
7372 lk 11,14bf9bb id 40162 0,5 4
6536 qc 11,14bf9bb 0,5 id 40162 sts 0 0
7359 ex plock 0
7372 req 7,14bf9bb ex 1ee13b-2be9a5 lkf 2000 wait 1
7372 lk 7,14bf9bb id 0 -1,5 2000
7372 lk 11,14bf9bb id 40162 5,0 4
6536 qc 11,14bf9bb 5,0 id 40162 sts 0 0
7339 en punlock 7,14cf9b6
7266 en punlock 7,14af9c1
7310 en punlock 7,14af9d4
7338 en punlock 7,14cf9b5
7359 en punlock 7,14af9bb
7329 en punlock 7,14af9bb
7403 en punlock 7,14cf9b9
7394 en punlock 7,14af9bf
7268 en punlock 7,14af9c2
7332 en punlock 7,14af9c5
7304 en punlock 7,14af9c7
7399 en punlock 7,14cf9b1
7341 en punlock 7,14cf9b0
7306 en punlock 7,14af9d0
7303 en punlock 7,14af9d9
6781 un 2,14af9e1 10194 5 0
6713 un 2,14af9e1 40334 5 0
6858 un 2,14af9bf 3012c 5 0
7012 un 2,14af9bf 10391 5 0
6935 un 2,14af9bd 40190 5 0


[root@morph-03 ~]# cat /proc/cluster/lock_dlm/debug
req 7,296f33e ex 6d77-7056 lkf 2000 wait 1
7262 lk 7,296f33e id 0 -1,5 2000
7262 lk 11,296f33e id 200e4 5,0 4
6527 qc 7,296f33e -1,5 id 1000fe sts 0 0
6527 qc 11,296f33e 5,0 id 200e4 sts 0 0
7262 ex plock 0
7309 en punlock 7,296f347
7309 lk 11,296f347 id 20190 0,5 4
7296 en punlock 7,295f344
6527 qc 11,296f347 0,5 id 20190 sts 0 0
7296 lk 11,295f344 id 202f6 0,5 4
6527 qc 11,295f344 0,5 id 202f6 sts 0 0
7309 remove 7,296f347
7309 un 7,296f347 120379 5 0
7296 remove 7,295f344
7296 un 7,295f344 13003f 5 0
6527 qc 7,296f347 5,5 id 120379 sts -65538 0
6527 qc 7,295f344 5,5 id 13003f sts -65538 0
7296 lk 11,295f344 id 202f6 5,0 4
7309 lk 11,296f347 id 20190 5,0 4
6527 qc 11,295f344 5,0 id 202f6 sts 0 0
6527 qc 11,296f347 5,0 id 20190 sts 0 0
7296 ex punlock 0
7309 ex punlock 0
7296 en plock 7,295f344
7296 lk 11,295f344 id 202f6 0,5 4
6527 qc 11,295f344 0,5 id 202f6 sts 0 0
7296 req 7,295f344 ex 2ea870-2ecd2e lkf 2000 wait 1
7296 lk 7,295f344 id 0 -1,5 2000
7296 lk 11,295f344 id 202f6 5,0 4
6527 qc 11,295f344 5,0 id 202f6 sts 0 0
7309 en plock 7,296f347
7309 lk 11,296f347 id 20190 0,5 4
6527 qc 11,296f347 0,5 id 20190 sts 0 0
7309 req 7,296f347 ex 68c7-7522 lkf 2000 wait 1
7309 lk 7,296f347 id 0 -1,5 2000
7309 lk 11,296f347 id 20190 5,0 4
6527 qc 11,296f347 5,0 id 20190 sts 0 0
7332 en punlock 7,297f33d
7332 lk 11,297f33d id 900be 0,5 4
6527 qc 11,297f33d 0,5 id 900be sts 0 0
7332 remove 7,297f33d
7332 un 7,297f33d 1002a3 5 0
6527 qc 7,297f33d 5,5 id 1002a3 sts -65538 0
7332 lk 11,297f33d id 900be 5,0 4
6527 qc 11,297f33d 5,0 id 900be sts 0 0
7332 ex punlock 0
7332 en plock 7,297f33d
7332 lk 11,297f33d id 900be 0,5 4
6527 qc 11,297f33d 0,5 id 900be sts 0 0
7332 req 7,297f33d ex 4f22-634a lkf 2000 wait 1
7332 lk 7,297f33d id 0 -1,5 2000
7332 lk 11,297f33d id 900be 5,0 4
6527 qc 7,297f33d -1,5 id 601a7 sts 0 0
6527 qc 11,297f33d 5,0 id 900be sts 0 0
7332 ex plock 0
7383 en punlock 7,295f356
7383 lk 11,295f356 id 203f0 0,5 4
6527 qc 11,295f356 0,5 id 203f0 sts 0 0
7383 remove 7,295f356
7383 un 7,295f356 b02cb 5 0
6527 qc 7,295f356 5,5 id b02cb sts -65538 0
7383 lk 11,295f356 id 203f0 5,0 4
6527 qc 11,295f356 5,0 id 203f0 sts 0 0
7383 ex punlock 0
7383 en plock 7,295f356
7383 lk 11,295f356 id 203f0 0,5 4
6527 qc 11,295f356 0,5 id 203f0 sts 0 0
7383 req 7,295f356 ex 0-4909 lkf 2000 wait 1
7383 lk 7,295f356 id 0 -1,5 2000
7383 lk 11,295f356 id 203f0 5,0 4
6527 qc 11,295f356 5,0 id 203f0 sts 0 0
7358 en punlock 7,295f345
7358 lk 11,295f345 id 40098 0,5 4
6527 qc 11,295f345 0,5 id 40098 sts 0 0
7358 remove 7,295f345
7358 un 7,295f345 100135 5 0
6527 qc 7,295f345 5,5 id 100135 sts -65538 0
7358 lk 11,295f345 id 40098 5,0 4
6527 qc 11,295f345 5,0 id 40098 sts 0 0
7358 ex punlock 0
7359 lk 11,295f345 id 40098 0,5 4
7358 en plock 7,296f351
6527 qc 11,295f345 0,5 id 40098 sts 0 0
7359 req 7,295f345 ex 0-7fffffffffffffff lkf 2000 wait 1
7359 lk 7,295f345 id 0 -1,5 2000
7359 lk 11,295f345 id 40098 5,0 4
6527 qc 11,295f345 5,0 id 40098 sts 0 0
6527 qc 7,295f356 -1,5 id 150113 sts 0 0
7383 ex plock 0
6527 qc 7,295f344 -1,5 id 120036 sts 0 0
6527 qc 7,296f347 -1,5 id 1201b2 sts 0 0
6527 qc 7,295f345 -1,5 id f0262 sts 0 0
7296 ex plock 0
7359 ex plock 0
7309 ex plock 0
7296 en punlock 7,295f344
7296 lk 11,295f344 id 202f6 0,5 4
6527 qc 11,295f344 0,5 id 202f6 sts 0 0
7296 remove 7,295f344
7296 un 7,295f344 120036 5 0
6527 qc 7,295f344 5,5 id 120036 sts -65538 0
7296 lk 11,295f344 id 202f6 5,0 4
6527 qc 11,295f344 5,0 id 202f6 sts 0 0
7296 ex punlock 0
7296 en plock 7,295f344
7296 lk 11,295f344 id 202f6 0,5 4
6527 qc 11,295f344 0,5 id 202f6 sts 0 0
7296 req 7,295f344 ex 2ed993-2edf4b lkf 2000 wait 1
7296 lk 7,295f344 id 0 -1,5 2000
7296 lk 11,295f344 id 202f6 5,0 4
6527 qc 11,295f344 5,0 id 202f6 sts 0 0
7257 en punlock 7,295f344
7332 en punlock 7,297f33d
7383 en punlock 7,295f356
7361 en punlock 7,296f351
7309 en punlock 7,296f347
7262 en punlock 7,296f33e
7350 en punlock 7,295f344
7359 en punlock 7,295f345
6781 un 2,295f34a 4024f 5 0
6704 un 2,297f349 20075 5 0
6858 un 2,295f34d 8037a 5 0
7012 un 2,295f347 200c8 5 0
6935 un 2,295f348 302df 5 0


[root@morph-04 ~]# cat /proc/cluster/lock_dlm/debug
cfa 0,5 id 100ba sts 0 0
7252 lk 11,3e18cf9 id 4011c 0,5 4
7251 remove 7,3e18cfa
7251 un 7,3e18cfa f03b6 5 0
6520 qc 11,3e18cf9 0,5 id 4011c sts 0 0
6520 qc 7,3e18cfa 5,5 id f03b6 sts -65538 0
7252 req 7,3e18cf9 ex 0-5c94 lkf 2000 wait 1
7252 lk 7,3e18cf9 id 0 -1,5 2000
7251 lk 11,3e18cfa id 100ba 5,0 4
6520 qc 11,3e18cfa 5,0 id 100ba sts 0 0
7251 ex punlock 0
7252 lk 11,3e18cf9 id 4011c 5,0 4
6520 qc 7,3e18cf9 -1,5 id 1000b9 sts 0 0
6520 qc 11,3e18cf9 5,0 id 4011c sts 0 0
7252 ex plock 0
7251 en plock 7,3e18cfa
7251 lk 11,3e18cfa id 100ba 0,5 4
6520 qc 11,3e18cfa 0,5 id 100ba sts 0 0
7251 req 7,3e18cfa ex 2ed19c-2ed6a4 lkf 2000 wait 1
7251 lk 7,3e18cfa id 0 -1,5 2000
7251 lk 11,3e18cfa id 100ba 5,0 4
6520 qc 11,3e18cfa 5,0 id 100ba sts 0 0
7354 en punlock 7,3e38cf1
7356 en punlock 7,3e18d05
7354 lk 11,3e38cf1 id 300bf 0,5 4
7356 lk 11,3e18d05 id 303c1 0,5 4
6520 qc 11,3e38cf1 0,5 id 300bf sts 0 0
6520 qc 11,3e18d05 0,5 id 303c1 sts 0 0
7354 remove 7,3e38cf1
7354 un 7,3e38cf1 d020a 5 0
7356 remove 7,3e18d05
7356 un 7,3e18d05 e02c4 5 0
6520 qc 7,3e38cf1 5,5 id d020a sts -65538 0
6520 qc 7,3e18d05 5,5 id e02c4 sts -65538 0
7354 lk 11,3e38cf1 id 300bf 5,0 4
7356 lk 11,3e18d05 id 303c1 5,0 4
6520 qc 11,3e38cf1 5,0 id 300bf sts 0 0
6520 qc 11,3e18d05 5,0 id 303c1 sts 0 0
7356 ex punlock 0
7354 ex punlock 0
7356 en plock 7,3e18d05
7356 lk 11,3e18d05 id 303c1 0,5 4
6520 qc 11,3e18d05 0,5 id 303c1 sts 0 0
7356 req 7,3e18d05 ex 6251-71e5 lkf 2000 wait 1
7354 en plock 7,3e38cf1
7356 lk 7,3e18d05 id 0 -1,5 2000
7354 lk 11,3e38cf1 id 300bf 0,5 4
6520 qc 11,3e38cf1 0,5 id 300bf sts 0 0
7354 req 7,3e38cf1 ex 2ed260-2edb1d lkf 2000 wait 1
7356 lk 11,3e18d05 id 303c1 5,0 4
7354 lk 7,3e38cf1 id 0 -1,5 2000
6520 qc 11,3e18d05 5,0 id 303c1 sts 0 0
7354 lk 11,3e38cf1 id 300bf 5,0 4
6520 qc 11,3e38cf1 5,0 id 300bf sts 0 0
6520 qc 7,3e18cfa -1,5 id 1402e6 sts 0 0
7251 ex plock 0
6520 qc 7,3e38cf1 -1,5 id c0012 sts 0 0
6520 qc 7,3e18d05 -1,5 id c022c sts 0 0
7354 ex plock 0
7356 ex plock 0
7354 en punlock 7,3e38cf1
7354 lk 11,3e38cf1 id 300bf 0,5 4
6520 qc 11,3e38cf1 0,5 id 300bf sts 0 0
7354 remove 7,3e38cf1
7354 un 7,3e38cf1 c0012 5 0
7297 en punlock 7,3e18cfa
6520 qc 7,3e38cf1 5,5 id c0012 sts -65538 0
7297 lk 11,3e18cfa id 100fd 0,5 4
6520 qc 11,3e18cfa 0,5 id 100fd sts 0 0
7287 en punlock 7,3e18cf9
7354 lk 11,3e38cf1 id 300bf 5,0 4
6520 qc 11,3e38cf1 5,0 id 300bf sts 0 0
7354 ex punlock 0
7287 lk 11,3e18cf9 id 3006b 0,5 4
7297 remove 7,3e18cfa
6520 qc 11,3e18cf9 0,5 id 3006b sts 0 0
7297 un 7,3e18cfa 15033d 5 0
7287 remove 7,3e18cf9
7287 un 7,3e18cf9 e01d8 5 0
6520 qc 7,3e18cfa 5,5 id 15033d sts -65538 0
6520 qc 7,3e18cf9 5,5 id e01d8 sts -65538 0
7297 lk 11,3e18cfa id 100fd 5,0 4
6520 qc 11,3e18cfa 5,0 id 100fd sts 0 0
7287 lk 11,3e18cf9 id 3006b 5,0 4
7354 en plock 7,3e38cf1
7297 ex punlock 0
7354 lk 11,3e38cf1 id 300bf 0,5 4
6520 qc 11,3e18cf9 5,0 id 3006b sts 0 0
6520 qc 11,3e38cf1 0,5 id 300bf sts 0 0
7297 en plock 7,3e18cfa
7354 req 7,3e38cf1 ex 2edb1e-2edf69 lkf 2000 wait 1
7297 lk 11,3e18cfa id 100fd 0,5 4
7354 lk 7,3e38cf1 id 0 -1,5 2000
7287 ex punlock 0
6520 qc 11,3e18cfa 0,5 id 100fd sts 0 0
7354 lk 11,3e38cf1 id 300bf 5,0 4
6520 qc 11,3e38cf1 5,0 id 300bf sts 0 0
7297 req 7,3e18cfa ex 2edd49-2edfa1 lkf 2000 wait 1
7287 en plock 7,3e18cf9
7297 lk 7,3e18cfa id 0 -1,5 2000
7287 lk 11,3e18cf9 id 3006b 0,5 4
6520 qc 11,3e18cf9 0,5 id 3006b sts 0 0
7297 lk 11,3e18cfa id 100fd 5,0 4
7287 req 7,3e18cf9 ex 72a5-74cc lkf 2000 wait 1
7287 lk 7,3e18cf9 id 0 -1,5 2000
6520 qc 11,3e18cfa 5,0 id 100fd sts 0 0
7287 lk 11,3e18cf9 id 3006b 5,0 4
6520 qc 7,3e18cf9 -1,5 id 130312 sts 0 0
6520 qc 11,3e18cf9 5,0 id 3006b sts 0 0
7287 ex plock 0
6520 qc 7,3e38cf1 -1,5 id 110217 sts 0 0
7354 ex plock 0
6520 qc 7,3e18cfa -1,5 id 1c01dd sts 0 0
7297 ex plock 0
7252 en punlock 7,3e18cf9
7251 en punlock 7,3e18cfa
7356 en punlock 7,3e18d05
7354 en punlock 7,3e38cf1
7287 en punlock 7,3e18cf9
7297 en punlock 7,3e18cfa
6774 un 2,3e18d1a 1030a 5 0
6697 un 2,3e18d0e 40283 5 0
7005 un 2,3e18cfb 301f3 5 0
6928 un 2,3e18cfe 40281 5 0
6851 un 2,3e18d0a 50194 5 0


Comment 7 David Teigland 2005-02-24 02:43:29 UTC
The original report shows stuck gfs/lock_dlm recovery (state recover 2).
Comment 5 shows stuck dlm recovery where /proc/cluster/dlm_debug might
help.  "ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd" might also be
useful.

Comment 8 Corey Marthaler 2005-03-01 23:19:42 UTC
Hit this again on a 5 node cluster, 7 gfs. Shot 4 nodes, only one left
up was morph-02. 

Here's all the info you asked for:

[root@morph-02 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[3 2 1 4 5]

DLM Lock Space:  "clvmd"                             3   4 run       -
[3]

DLM Lock Space:  "gfs0"                              4   5 run       -
[3]

DLM Lock Space:  "gfs1"                              6   7 run       -
[3]

DLM Lock Space:  "gfs2"                              8   9 run       -
[3]

DLM Lock Space:  "gfs3"                             10  11 run       -
[3]

DLM Lock Space:  "gfs4"                             12  13 run       -
[3]

DLM Lock Space:  "gfs5"                             14  15 run       -
[3]

DLM Lock Space:  "gfs6"                             16  17 run       -
[3]

GFS Mount Group: "gfs0"                              5   6 recover 2 -
[3]

GFS Mount Group: "gfs1"                              7   8 recover 2 -
[3]

GFS Mount Group: "gfs2"                              9  10 run       -
[3]

GFS Mount Group: "gfs3"                             11  12 run       -
[3]

GFS Mount Group: "gfs5"                             15  16 run       -
[3]

GFS Mount Group: "gfs6"                             17  18 run       -
[3]



[root@morph-02 ~]# cat /proc/cluster/lock_dlm/debug
b7e7 0,5 id 10225 sts 0 0
7794 req 7,ebb7e7 ex 0-7fffffffffffffff lkf 2000 wait 1
7794 lk 7,ebb7e7 id 0 -1,5 2000
7794 lk 11,ebb7e7 id 10225 5,0 4
6619 qc 7,ebb7e7 -1,5 id e102c8 sts 0 0
6619 qc 11,ebb7e7 5,0 id 10225 sts 0 0
7794 ex plock 0
7620 en punlock 7,ebb7e4
7811 en punlock 7,ebb7ee
7620 lk 11,ebb7e4 id 101af 0,5 4
7811 lk 11,ebb7ee id 201d7 0,5 4
6619 qc 11,ebb7e4 0,5 id 101af sts 0 0
6619 qc 11,ebb7ee 0,5 id 201d7 sts 0 0
7620 remove 7,ebb7e4
7620 un 7,ebb7e4 113013c 5 0
6619 qc 7,ebb7e4 5,5 id 113013c sts -65538 0
7811 remove 7,ebb7ee
7811 un 7,ebb7ee f603cf 5 0
7620 lk 11,ebb7e4 id 101af 5,0 4
6619 qc 7,ebb7ee 5,5 id f603cf sts -65538 0
6619 qc 11,ebb7e4 5,0 id 101af sts 0 0
7620 ex punlock 0
7620 en plock 7,ebb7e4
7620 lk 11,ebb7e4 id 101af 0,5 4
6619 qc 11,ebb7e4 0,5 id 101af sts 0 0
7620 req 7,ebb7e4 ex 0-7fffffffffffffff lkf 2000 wait 1
7620 lk 7,ebb7e4 id 0 -1,5 2000
7620 lk 11,ebb7e4 id 101af 5,0 4
6619 qc 7,ebb7e4 -1,5 id 125019e sts 0 0
6619 qc 11,ebb7e4 5,0 id 101af sts 0 0
7620 ex plock 0
7811 lk 11,ebb7ee id 201d7 5,0 4
6619 qc 11,ebb7ee 5,0 id 201d7 sts 0 0
7811 ex punlock 0
7811 en plock 7,ebb7ee
7811 lk 11,ebb7ee id 201d7 0,5 4
6619 qc 11,ebb7ee 0,5 id 201d7 sts 0 0
7811 req 7,ebb7ee ex 2e1138-2e230a lkf 2000 wait 1
7811 lk 7,ebb7ee id 0 -1,5 2000
7811 lk 11,ebb7ee id 201d7 5,0 4
6619 qc 7,ebb7ee -1,5 id f702f2 sts 0 0
6619 qc 11,ebb7ee 5,0 id 201d7 sts 0 0
7811 ex plock 0
7641 en punlock 7,ebbc52
7641 lk 11,ebbc52 id 103c1 0,5 4
6619 qc 11,ebbc52 0,5 id 103c1 sts 0 0
7641 remove 7,ebbc52
7641 un 7,ebbc52 f20135 5 0
6619 qc 7,ebbc52 5,5 id f20135 sts -65538 0
7641 lk 11,ebbc52 id 103c1 5,0 4
6619 qc 11,ebbc52 5,0 id 103c1 sts 0 0
7641 ex punlock 0
7641 en plock 7,ebbc5f
7641 lk 11,ebbc5f id 102a4 0,5 4
6619 qc 11,ebbc5f 0,5 id 102a4 sts 0 0
7641 req 7,ebbc5f ex 0-7fffffffffffffff lkf 2000 wait 1
7641 lk 7,ebbc5f id 0 -1,5 2000
7794 en punlock 7,ebb7e7
7794 lk 11,ebb7e7 id 10225 0,5 4
7641 lk 11,ebbc5f id 102a4 5,0 4
6619 qc 7,ebbc5f -1,5 id f401f1 sts 0 0
6619 qc 11,ebb7e7 0,5 id 10225 sk 7,edb746
7746 lk 11,edb746 id 2006d 0,5 4
6619 qc 11,edb746 0,5 id 2006d sts 0 0
7621 en punlock 7,ebb7e2
7746 remove 7,edb746
7746 un 7,edb746 112032b 5 0
6619 qc 7,edb746 5,5 id 112032b sts -65538 0
7621 lk 11,ebb7e2 id 2003a 0,5 4
7746 lk 11,edb746 id 2006d 5,0 4
6619 qc 11,ebb7e2 0,5 id 2003a sts 0 0
6619 qc 11,edb746 5,0 id 2006d sts 0 0
7746 ex punlock 0
7746 en plock 7,edb74e
7621 remove 7,ebb7e2
7621 un 7,ebb7e2 11202bd 5 0
6619 qc 7,ebb7e2 5,5 id 11202bd sts -65538 0
7621 lk 11,ebb7e2 id 2003a 5,0 4
6619 qc 11,ebb7e2 5,0 id 2003a sts 0 0
7621 ex punlock 0
7621 en plock 7,ebb7e5
7621 lk 11,ebb7e5 id 10097 0,5 4
6619 qc 11,ebb7e5 0,5 id 10097 sts 0 0
7621 req 7,ebb7e5 ex 0-7fffffffffffffff lkf 2000 wait 1
7621 lk 7,ebb7e5 id 0 -1,5 2000
7621 lk 11,ebb7e5 id 10097 5,0 4
6619 qc 7,ebb7e5 -1,5 id 11e00d6 sts 0 0
6619 qc 11,ebb7e5 5,0 id 10097 sts 0 0
7621 ex plock 0
7622 lk 11,ebb7e2 id 2003a 0,5 4
6619 qc 11,ebb7e2 0,5 id 2003a sts 0 0
7622 req 7,ebb7e2 ex 0-7fffffffffffffff lkf 2000 wait 1
7622 lk 7,e22 lk 11,ebb7e2 id 2003a 5,0 4
6619 qc 7,ebb7e2 -1,5 id 11202dc sts 0 0
6619 qc 11,ebb7e2 5,0 id 2003a sts 0 0
7622 ex plock 0
7639 en punlock 7,ebbc5e
7639 lk 11,ebbc5e id 302a7 0,5 4
6619 qc 11,ebbc5e 0,5 id 302a7 sts 0 0
7639 remove 7,ebbc5e
7639 un 7,ebbc5e e70074 5 0
6619 qc 7,ebbc5e 5,5 id e70074 sts -65538 0
7639 lk 11,ebbc5e id 302a7 5,0 4
6619 qc 11,ebbc5e 5,0 id 302a7 sts 0 0
7639 ex punlock 0
7639 en plock 7,ebbc5f
7644 lk 11,ebbc5e id 302a7 0,5 4
6619 qc 11,ebbc5e 0,5 id 302a7 sts 0 0
7644 req 7,ebbc5e ex 0-7fffffffffffffff lkf 2000 wait 1
7644 lk 7,ebbc5e id 0 -1,5 2000
7644 lk 11,ebbc5e id 302a7 5,0 4
6619 qc 7,ebbc5e -1,5 id 1000185 sts 0 0
6619 qc 11,ebbc5e 5,0 id 302a7 sts 0 0
7644 ex plock 0
7621 en punlock 7,ebb7e5
7621 lk 11,ebb7e5 id 10097 0,5 4
6619 qc 11,ebb7e5 0,5 id 10097 sts 0 0
7621 remove 7,ebb7e5
7621 un 7,ebb7e5 11e00d6 5 0
6619 qc 7,ebb7e5 5,5 id 11e00d6 sts -65538 0
7621 lk 11,ebb7e5 id 10097 5,0 4
6619 qc 11,ebb7e5 5,0 id 10097 sts 0 0
7621 ex punlock 0
7621 en plock 7,ebb7e4

[root@morph-02 ~]# cat /proc/cluster/dlm_debug
1 "      11
gfs2 resent 7 requests
gfs1 recover event 71 finished
gfs2 recover event 71 finished
gfs4 move flags 0,0,1 ids 65,71,71
gfs0 processed 0 requests
gfs0 resend marked requests
gfs0 resend e0222 lq 3 flg 1200008 node 0/0 "       8
gfs0 resend 9034e lq 1 flg 200000 node -1/-1 "       7
gfs0 resend 702e0 lq 1 flg 200000 node -1/-1 "       7
gfs0 resend 703bd lq 1 flg 200000 node -1/-1 "       5
gfs0 resend b02fa lq 1 flg 200000 node -1/-1 "       7
gfs0 resent 5 requests
gfs0 recover event 71 finished
gfs4 process held requests
gfs4 processed 0 requests
gfs4 resend marked requests
gfs4 resend d01ab lq 3 flg 1200008 node 0/0 "       8
gfs4 resend 60116 lq 1 flg 200000 node -1/-1 "       7
gfs4 resend c03ec lq 1 flg 200000 node -1/-1 "       7
gfs4 resend a018d lq 1 flg 200000 node -1/-1 "       7
gfs4 resend 8011f lq 1 flg 200000 node -1/-1 "       5
gfs4 resent 5 requests
gfs4 recover event 71 finished


[root@morph-02 ~]# ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
  PID WIDE-WCHAN-COLUMN CMD
    1 -                 init [3]
    2 migration_thread  [migration/0]
    3 ksoftirqd         [ksoftirqd/0]
    4 migration_thread  [migration/1]
    5 ksoftirqd         [ksoftirqd/1]
    6 worker_thread     [events/0]
    7 worker_thread     [events/1]
    8 worker_thread     [khelper]
    9 worker_thread     [kacpid]
   41 worker_thread     [kblockd/0]
   42 worker_thread     [kblockd/1]
   43 hub_thread        [khubd]
   52 pdflush           [pdflush]
   53 -                 [pdflush]
   55 worker_thread     [aio/0]
   56 worker_thread     [aio/1]
   54 kswapd            [kswapd0]
  129 serio_thread      [kseriod]
  198 -                 [scsi_eh_0]
  199 -                 [qla2300_0_dpc]
  224 worker_thread     [kmirrord/0]
  225 worker_thread     [kmirrord/1]
  241 kjournald         [kjournald]
 1104 -                 udevd
 1464 kjournald         [kjournald]
 1810 -                 syslogd -m 0
 1814 syslog            klogd -x
 1824 -                 irqbalance
 1834 -                 portmap
 1853 -                 rpc.statd
 1883 -                 rpc.idmapd
 1985 -                 /usr/sbin/smartd
 1994 -                 /usr/sbin/acpid
 2005 -                 cupsd
 2040 -                 /usr/sbin/sshd
 2053 -                 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2072 -                 sendmail: rejecting connections on daemon MTA:
load average: 93
 2080 pause             sendmail: Queue runner@01:00:00 for
/var/spool/clientmqueue
 2135 -                 gpm -m /dev/input/mice -t imps2
 2188 -                 crond
 2213 -                 xfs -droppriv -daemon
 2230 -                 /usr/sbin/atd
 2239 -                 dbus-daemon-1 --system
 2250 -                 cups-config-daemon
 2260 -                 hald
 2267 -                 /sbin/agetty ttyS0 115200 vt100
 2268 -                 /sbin/mingetty tty1
 2269 -                 /sbin/mingetty tty2
 2270 -                 /sbin/mingetty tty3
 2271 -                 /sbin/mingetty tty4
 2272 -                 /sbin/mingetty tty5
 2273 -                 /sbin/mingetty tty6
 5871 -                 ccsd
 5893 cluster_kthread   [cman_comms]
 5895 serviced          [cman_serviced]
 5894 membership_kthrea [cman_memb]
 5972 hello_kthread     [cman_hbeat]
 6031 rt_sigsuspend     fenced
 6618 -                 clvmd
 6619 dlm_astd          [dlm_astd]
 6620 dlm_recvd         [dlm_recvd]
 6621 dlm_sendd         [dlm_sendd]
 6622 dlm_recoverd      [dlm_recoverd]
 6814 dlm_recoverd      [dlm_recoverd]
 6815 dlm_async         [lock_dlm1]
 6816 dlm_async         [lock_dlm2]
 6817 -                 [gfs_scand]
 6818 gfs_glockd        [gfs_glockd]
 6819 wait_on_buffer    [gfs_recoverd]
 6820 -                 [gfs_logd]
 6821 glock_wait_intern [gfs_quotad]
 6822 -                 [gfs_inoded]
 6882 dlm_recoverd      [dlm_recoverd]
 6883 dlm_async         [lock_dlm1]
 6884 dlm_async         [lock_dlm2]
 6885 -                 [gfs_scand]
 6886 gfs_glockd        [gfs_glockd]
 6887 wait_on_buffer    [gfs_recoverd]
 6888 -                 [gfs_logd]
 6889 glock_wait_intern [gfs_quotad]
 6890 -                 [gfs_inoded]
 6959 dlm_recoverd      [dlm_recoverd]
 6960 dlm_async         [lock_dlm1]
 6961 dlm_async         [lock_dlm2]
 6962 -                 [gfs_scand]
 6963 gfs_glockd        [gfs_glockd]
 6964 -                 [gfs_recoverd]
 6965 -                 [gfs_logd]
 6966 -                 [gfs_quotad]
 6967 -                 [gfs_inoded]
 7036 dlm_recoverd      [dlm_recoverd]
 7037 dlm_async         [lock_dlm1]
 7038 dlm_async         [lock_dlm2]
 7039 -                 [gfs_scand]
 7040 -                 [gfs_glockd]
 7041 -                 [gfs_recoverd]
 7042 -                 [gfs_logd]
 7043 -                 [gfs_quotad]
 7044 -                 [gfs_inoded]
 7113 dlm_recoverd      [dlm_recoverd]
 7114 dlm_async         [lock_dlm1]
 7115 dlm_async         [lock_dlm2]
 7116 -                 [gfs_scand]
 7117 gfs_glockd        [gfs_glockd]
 7118 -                 [gfs_recoverd]
 7119 -                 [gfs_logd]
 7120 -                 [gfs_quotad]
 7121 -                 [gfs_inoded]
 7190 dlm_recoverd      [dlm_recoverd]
 7191 dlm_async         [lock_dlm1]
 7192 dlm_async         [lock_dlm2]
 7193 -                 [gfs_scand]
 7194 gfs_glockd        [gfs_glockd]
 7195 -                 [gfs_recoverd]
 7196 -                 [gfs_logd]
 7197 -                 [gfs_quotad]
 7198 -                 [gfs_inoded]
 7267 dlm_recoverd      [dlm_recoverd]
 7268 -                 [lock_dlm1]
 7269 -                 [lock_dlm2]
 7270 -                 [gfs_scand]
 7271 gfs_glockd        [gfs_glockd]
 7272 -                 [gfs_recoverd]
 7273 -                 [gfs_logd]
 7274 -                 [gfs_quotad]
 7275 -                 [gfs_inoded]
 7446 -                 sshd: root@notty
 7448 -                 sshd: root@notty
 7450 -                 sshd: root@notty
 7451 -                 sshd: root@notty
 7454 -                 sshd: root@notty
 7456 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 7457 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 7465 -                 sshd: root@notty
 7468 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 7481 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 7482 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 7541 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 7565 -                 sshd: root@notty
 7568 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 7586 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 7587 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 7588 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 7597 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.7586 -
 7598 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.7587 -
 7599 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 7602 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 7605 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 7608 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7587 -l /tmp/revolver/7587/revolver_l
 7609 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7587 -l /tmp/revolver/7587/revolver_l
 7610 wait_on_buffer    growfiles -i 0 -N 500 -n 4 -b
 7611 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7587 -l /tmp/revolver/7587/revolver_l
 7612 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 7613 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7587 -l /tmp/revolver/7587/revolver_l
 7614 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7615 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7587 -l /tmp/revolver/7587/revolver_l
 7616 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 7617 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7587 -l /tmp/revolver/7587/revolver_l
 7618 wait              genesis -n 500 -d 150 -p 4
 7619 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7620 -                 accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7621 wait_on_buffer    accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7622 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7623 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7624 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7625 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7626 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7627 wait_on_buffer    growfiles -i 0 -N 500 -n 4 -b
 7628 -                 growfiles -i 0 -N 500 -n 4 -b
 7629 -                 growfiles -i 0 -N 500 -n 4 -b
 7630 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7586 -l /tmp/revolver/7586/revolver_l
 7631 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7586 -l /tmp/revolver/7586/revolver_l
 7632 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 7633 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 7634 lock_page         doio -avk
 7635 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7586 -l /tmp/revolver/7586/revolver_l
 7636 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7637 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7586 -l /tmp/revolver/7586/revolver_l
 7638 wait              genesis -n 500 -d 150 -p 4
 7639 -                 accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7640 -                 genesis -n 500 -d 150 -p 4
 7641 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7642 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7643 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7586 -l /tmp/revolver/7586/revolver_l
 7644 wait_on_buffer    accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7645 -                 growfiles -i 0 -N 500 -n 4 -b
 7646 -                 growfiles -i 0 -N 500 -n 4 -b
 7647 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7586 -l /tmp/revolver/7586/revolver_l
 7648 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 7649 -                 genesis -n 500 -d 150 -p 4
 7650 glock_wait_intern genesis -n 500 -d 150 -p 4
 7651 -                 accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7652 wait_on_buffer    growfiles -i 0 -N 500 -n 4 -b
 7653 -                 growfiles -i 0 -N 500 -n 4 -b
 7654 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 7655 -                 doio -avk
 7657 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.7599 -
 7659 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.7605 -
 7660 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7605 -l /tmp/revolver/7605/revolver_l
 7661 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7605 -l /tmp/revolver/7605/revolver_l
 7662 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7663 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7605 -l /tmp/revolver/7605/revolver_l
 7664 wait              genesis -n 500 -d 150 -p 4
 7665 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7605 -l /tmp/revolver/7605/revolver_l
 7666 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 7667 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7605 -l /tmp/revolver/7605/revolver_l
 7668 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7669 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7671 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7605 -l /tmp/revolver/7605/revolver_l
 7672 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7673 -                 growfiles -i 0 -N 500 -n 4 -b
 7674 wait_async        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7675 -                 genesis -n 500 -d 150 -p 4
 7676 glock_wait_intern genesis -n 500 -d 150 -p 4
 7677 glock_wait_intern genesis -n 500 -d 150 -p 4
 7678 -                 genesis -n 500 -d 150 -p 4
 7670 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 7679 -                 growfiles -i 0 -N 500 -n 4 -b
 7680 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 7681 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 7682 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 7683 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 7684 wait_async        doio -avk
 7685 wait_async        doio -avk
 7686 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 7687 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 7688 -                 doio -avk
 7689 -                 doio -avk
 7728 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.7602 -
 7729 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7602 -l /tmp/revolver/7602/revolver_l
 7730 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7602 -l /tmp/revolver/7602/revolver_l
 7731 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 7732 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7602 -l /tmp/revolver/7602/revolver_l
 7733 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 7734 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7602 -l /tmp/revolver/7602/revolver_l
 7735 wait              genesis -n 500 -d 150 -p 4
 7736 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7602 -l /tmp/revolver/7602/revolver_l
 7737 -                 growfiles -i 0 -N 500 -n 4 -b
 7738 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 7739 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7602 -l /tmp/revolver/7602/revolver_l
 7740 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7741 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7742 -                 doio -avk
 7743 -                 genesis -n 500 -d 150 -p 4
 7744 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7745 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7746 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7747 wait_on_buffer    accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7748 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7749 -                 growfiles -i 0 -N 500 -n 4 -b
 7750 -                 growfiles -i 0 -N 500 -n 4 -b
 7751 -                 growfiles -i 0 -N 500 -n 4 -b
 7752 -                 accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7753 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 7754 -                 doio -avk
 7755 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7599 -l /tmp/revolver/7599/revolver_l
 7756 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7599 -l /tmp/revolver/7599/revolver_l
 7757 -                 growfiles -i 0 -N 500 -n 4 -b
 7758 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7599 -l /tmp/revolver/7599/revolver_l
 7759 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7761 -                 growfiles -i 0 -N 500 -n 4 -b
 7760 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7599 -l /tmp/revolver/7599/revolver_l
 7762 wait_on_buffer    growfiles -i 0 -N 500 -n 4 -b
 7763 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7599 -l /tmp/revolver/7599/revolver_l
 7764 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 7765 -                 growfiles -i 0 -N 500 -n 4 -b
 7766 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7599 -l /tmp/revolver/7599/revolver_l
 7767 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 7768 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 7769 -                 doio -avk
 7770 wait              genesis -n 500 -d 150 -p 4
 7771 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7773 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7772 wait_on_buffer    accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7774 -                 genesis -n 500 -d 150 -p 4
 7775 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7776 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7777 wait_on_buffer    accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7778 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 7779 wait_on_buffer    accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7780 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 7781 wait_on_buffer    doio -avk
 7785 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.7780 -
 7786 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7780 -l /tmp/revolver/7780/revolver_l
 7787 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7780 -l /tmp/revolver/7780/revolver_l
 7788 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7789 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7780 -l /tmp/revolver/7780/revolver_l
 7790 -                 growfiles -i 0 -N 500 -n 4 -b
 7792 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7791 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7780 -l /tmp/revolver/7780/revolver_l
 7794 -                 accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7795 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7780 -l /tmp/revolver/7780/revolver_l
 7793 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 7797 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7796 wait              genesis -n 500 -d 150 -p 4
 7798 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7780 -l /tmp/revolver/7780/revolver_l
 7799 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 7800 -                 accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7801 -                 growfiles -i 0 -N 500 -n 4 -b
 7802 wait_on_buffer    growfiles -i 0 -N 500 -n 4 -b
 7803 -                 growfiles -i 0 -N 500 -n 4 -b
 7804 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7805 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7806 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7807 wait_on_buffer    genesis -n 500 -d 150 -p 4
 7808 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 7809 -                 doio -avk
 7810 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 7811 -                 doio -avk
 7813 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.7588 -
 7814 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7588 -l /tmp/revolver/7588/revolver_l
 7815 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7588 -l /tmp/revolver/7588/revolver_l
 7817 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7588 -l /tmp/revolver/7588/revolver_l
 7816 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 7818 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7588 -l /tmp/revolver/7588/revolver_l
 7819 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7821 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 7820 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7822 wait_async        doio -avk
 7823 wait_async        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7824 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7825 wait              genesis -n 500 -d 150 -p 4
 7826 -                 genesis -n 500 -d 150 -p 4
 7827 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 7828 glock_wait_intern genesis -n 500 -d 150 -p 4
 7829 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7588 -l /tmp/revolver/7588/revolver_l
 7830 -                 genesis -n 500 -d 150 -p 4
 7831 -                 growfiles -i 0 -N 500 -n 4 -b
 7832 -                 genesis -n 500 -d 150 -p 4
 7833 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.7588 -l /tmp/revolver/7588/revolver_l
 7834 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 7835 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 7836 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 7837 -                 growfiles -i 0 -N 500 -n 4 -b
 7838 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 7839 wait_async        doio -avk
 9019 -                 sshd: root@pts/0
 9021 wait              -bash
 9059 -                 ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd

Comment 9 Corey Marthaler 2005-03-01 23:23:46 UTC
WHOA! I take that last comment back! It just took about 15-20 minutes
is all. :(  I'll try more super high lock count recovery to see if
this isn't just taking a SUPER long time to recover.

Comment 10 David Teigland 2005-03-02 02:53:14 UTC
While gfs0 and gfs1 are in state "recover 2", gfs should be doing
journal recovery for those two fs's.  At the end there should be
a message from gfs stating how long it took to do each journal replay;
it would be interesting to see what that showed.

I'm betting that the journal recovery time will indicate the 15-20
minutes you waited.  The ps shows that a bunch of processes, including
the first two gfs_recoverd threads, are blocked on "wait_on_buffer".
If I'm not mistaken, this means they are waiting for i/o to complete.

I've seen situations where the storage device or drivers or something
below the fs hangs for a long time and all i/o hangs until the
problem is resolved.  This could be the situation.  It's also
possible that there's no i/o hang, but just an i/o bottleneck.
The i/o on all the running fs's could be starving the i/o from the
gfs_recoverd threads on the two unrecovered fs's.  Running top
during this time would probably tell a lot about which of these
was going on.



Comment 11 Corey Marthaler 2005-03-14 21:05:17 UTC
Hit this bug for real this time. The DLM service is actually still
stuck in the recovery state. 5 node cluster, 4 gfs. Shot 2 nodes
(morph-03 and morph-04).

[root@morph-01 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[2 3 1 4 5]

DLM Lock Space:  "clvmd"                             2   3 recover 4 -
[2 3 1]

DLM Lock Space:  "gfs0"                              3   4 recover 2 -
[2 3 1]

DLM Lock Space:  "gfs1"                              5   6 recover 2 -
[2 3 1]

DLM Lock Space:  "gfs2"                              7   8 recover 2 -
[2 3 1]

DLM Lock Space:  "gfs3"                              9  10 recover 2 -
[2 3 1]

GFS Mount Group: "gfs0"                              4   5 recover 0 -
[2 3 1]

GFS Mount Group: "gfs1"                              6   7 recover 0 -
[2 3 1]

GFS Mount Group: "gfs2"                              8   9 recover 0 -
[2 3 1]

GFS Mount Group: "gfs3"                             10  11 recover 0 -
[2 3 1]


[root@morph-01 ~]# cat /proc/cluster/lock_dlm/debug
nlock 0
6425 en plock 7,10412
6425 lk 11,10412 id 200ec 0,5 4
5905 qc 11,10412 0,5 id 200ec sts 0 0
6425 req 7,10412 ex 0-43b4 lkf 2000 wait 1
6425 lk 7,10412 id 0 -1,5 2000
6425 lk 11,10412 id 200ec 5,0 4
5905 qc 7,10412 -1,5 id a0209 sts 0 0
5905 qc 11,10412 5,0 id 200ec sts 0 0
6425 ex plock 0
6328 en punlock 7,2040f
6328 lk 11,2040f id 201e1 0,5 4
5905 qc 11,2040f 0,5 id 201e1 sts 0 0
6328 remove 7,2040f
6328 un 7,2040f d0039 5 0
5905 qc 7,2040f 5,5 id d0039 sts -65538 0
6328 lk 11,2040f id 201e1 5,0 4
5905 qc 11,2040f 5,0 id 201e1 sts 0 0
6328 ex punlock 0
6328 en plock 7,2040f
6328 lk 11,2040f id 201e1 0,5 4
5905 qc 11,2040f 0,5 id 201e1 sts 0 0
6328 req 7,2040f ex 2ec536-2edcef lkf 2000 wait 1
6328 lk 7,2040f id 0 -1,5 2000
6328 lk 11,2040f id 201e1 5,0 4
5905 qc 11,2040f 5,0 id 201e1 sts 0 0
5905 qc 7,2040f -1,5 id c03f6 sts 0 0
6328 ex plock 0
6328 en punlock 7,2040f
6328 lk 11,2040f id 201e1 0,5 4
5905 qc 11,2040f 0,5 id 201e1 sts 0 0
6328 remove 7,2040f
6328 un 7,2040f c03f6 5 0
5905 qc 7,2040f 5,5 id c03f6 sts -65538 0
6328 lk 11,2040f id 201e1 5,0 4
5905 qc 11,2040f 5,0 id 201e1 sts 0 0
6430 en punlock 7,55
6328 ex punlock 0
6430 lk 11,55 id 30332 0,5 4
6328 en plock 7,2040f
5905 qc 11,55 0,5 id 30332 sts 0 0
6328 lk 11,2040f id 201e1 0,5 4
5905 qc 11,2040f 0,5 id 201e1 sts 0 0
6430 remove 7,55
6430 un 7,55 d00df 5 0
6328 req 7,2040f ex 2edcf0-2edfe4 lkf 2000 wait 1
6328 lk 7,2040f id 0 -1,5 2000
5905 qc 7,55 5,5 id d00df sts -65538 0
6430 lk 11,55 id 30332 5,0 4
6328 lk 11,2040f id 201e1 5,0 4
5905 qc 11,55 5,0 id 30332 sts 0 0
5905 qc 11,2040f 5,0 id 201e1 sts 0 0
6430 ex punlock 0
5905 qc 7,2040f -1,5 id a0380 sts 0 0
6328 ex plock 0
6430 en plock 7,55
6430 lk 11,55 id 30332 0,5 4
5905 qc 11,55 0,5 id 30332 sts 0 0
6430 req 7,55 ex 2abfa7-2da36b lkf 2000 wait 1
6430 lk 7,55 id 0 -1,5 2000
6430 lk 11,55 id 30332 5,0 4
6328 en punlock 7,2040f
5905 qc 11,55 5,0 id 30332 sts 0 0
6328 lk 11,2040f id 201e1 0,5 4
5905 qc 11,2040f 0,5 id 201e1 sts 0 0
6328 remove 7,2040f
6328 un 7,2040f a0380 5 0
5905 qc 7,2040f 5,5 id a0380 sts -65538 0
6328 lk 11,2040f id 201e1 5,0 4
5905 qc 11,2040f 5,0 id 201e1 sts 0 0
6328 ex punlock 0
5905 qc 7,55 -1,5 id b033c sts 0 0
6430 ex plock 0
6272 en punlock 7,10415
6272 lk 11,10415 id 10148 0,5 4
5905 qc 11,10415 0,5 id 10148 sts 0 0
6272 remove 7,10415
6272 un 7,10415 b0169 5 0
5905 qc 7,10415 5,5 id b0169 sts -65538 0
6272 lk 11,10415 id 10148 5,0 4
5905 qc 11,10415 5,0 id 10148 sts 0 0
6272 ex punlock 0
6272 en plock 7,10415
6272 lk 11,10415 id 10148 0,5 4
5905 qc 11,10415 0,5 id 10148 sts 0 0
6425 en punlock 7,10412
6425 lk 11,10412 id 200ec 0,5 4
5905 qc 11,10412 0,5 id 200ec sts 0 0
6425 remove 7,10412
6425 un 7,10412 a0209 5 0
5905 qc 7,10412 5,5 id a0209 sts -65538 0
6425 lk 11,10412 id 200ec 5,0 4
5905 qc 11,10412 5,0 id 200ec sts 0 0
6425 ex punlock 0
6425 en plock 7,10412
6425 lk 11,10412 id 200ec 0,5 4
5905 qc 11,10412 0,5 id 200ec sts 0 0
6425 req 7,10412 ex 43b4-7127 lkf 2000 wait 1
6425 lk 7,10412 id 0 -1,5 2000
6425 lk 11,10412 id 200ec 5,0 4
5905 qc 7,10412 -1,5 id a0057 sts 0 0
5905 qc 11,10412 5,0 id 200ec sts 0 0
6425 ex plock 0
6381 en punlock 7,10414
6381 lk 11,10414 id 30372 0,5 4
5905 qc 11,10414 0,5 id 30372 sts 0 0
6381 remove 7,10414
6381 un 7,10414 c003b 5 0
6272 req 7,10415 ex 64f4-72e5 lkf 2000 wait 1
6272 lk 7,10415 id 0 -1,5 2000
5905 qc 7,10414 5,5 id c003b sts -65538 0
6430 en punlock 7,55
6430 lk 11,55 id 30332 0,5 4
5905 qc 7,10415 -1,5 id 90090 sts 0 0
5905 qc 11,55 0,5 id 30332 sts 0 0
6272 lk 11,10415 id 10148 5,0 4
6381 lk 11,10414 id 30372 5,0 4
6430 remove 7,55
6430 un 7,55 b033c 5 0
5905 qc 11,10415 5,0 id 10148 sts 0 0
5905 qc 11,10414 5,0 id 30372 sts 0 0
6272 ex plock 0
5905 qc 7,55 5,5 id b033c sts -65538 0
6430 lk 11,55 id 30332 5,0 4
5905 qc 11,55 5,0 id 30332 sts 0 0
6430 ex punlock 0
6430 en plock 7,55
6381 ex punlock 0
6328 en plock 7,2040f
6381 en plock 7,10414
6384 en punlock 7,10415
6425 en punlock 7,10412
6272 en punlock 7,10415
6026 un 2,2058c 30153 5 0
6065 un 2,19 100c3 3 0
6104 un 2,609b7 3018f 5 0
6143 un 2,19 102f3 3 0

[root@morph-02 ~]# cat /proc/cluster/lock_dlm/debug
 5,0 4
5922 qc 11,1a00274 5,0 id 20320 sts 0 0
5922 qc 7,1a00274 -1,5 id 83037b sts 0 0
6456 ex plock 0
6443 en punlock 7,19e0273
6443 lk 11,19e0273 id 50274 0,5 4
5922 qc 11,19e0273 0,5 id 50274 sts 0 0
6443 remove 7,19e0273
6443 un 7,19e0273 7f014a 5 0
5922 qc 7,19e0273 5,5 id 7f014a sts -65538 0
6443 lk 11,19e0273 id 50274 5,0 4
5922 qc 11,19e0273 5,0 id 50274 sts 0 0
6443 ex punlock 0
6443 en plock 7,19e0273
6443 lk 11,19e0273 id 50274 0,5 4
5922 qc 11,19e0273 0,5 id 50274 sts 0 0
6443 req 7,19e0273 ex 5fc7-6cbb lkf 2000 wait 1
6443 lk 7,19e0273 id 0 -1,5 2000
6443 lk 11,19e0273 id 50274 5,0 4
5922 qc 7,19e0273 -1,5 id 8e02ec sts 0 0
5922 qc 11,19e0273 5,0 id 50274 sts 0 0
6443 ex plock 0
6415 en punlock 7,1a00270
6415 lk 11,1a00270 id 10134 0,5 4
5922 qc 11,1a00270 0,5 id 10134 sts 0 0
6415 remove 7,1a00270
6415 un 7,1a00270 8b005d 5 0
5922 qc 7,1a00270 5,5 id 8b005d sts -65538 0
6415 lk 11,1a00270 id 10134 5,0 4
5922 qc 11,1a00270 5,0 id 10134 sts 0 0
6415 ex punlock 0
6415 en plock 7,1a00270
6415 lk 11,1a00270 id 10134 0,5 4
5922 qc 11,1a00270 0,5 id 10134 sts 0 0
6415 req 7,1a00270 ex 0-101e lkf 2000 wait 1
6415 lk 7,1a00270 id 0 -1,5 2000
6415 lk 11,1a00270 id 10134 5,0 4
5922 qc 7,1a00270 -1,5 id 7900e0 sts 0 0
5922 qc 11,1a00270 5,0 id 10134 sts 0 0
6415 ex plock 0
6426 en punlock 7,19e0282
6426 lk 11,19e0282 id 30290 0,5 4
5922 qc 11,19e0282 0,5 id 30290 sts 0 0
6426 remove 7,19e0282
6426 un 7,19e0282 87025c 5 0
5922 qc 7,19e0282 5,5 id 87025c sts -65538 0
6426 lk 11,19e0282 id 30290 5,0 4
5922 qc 11,19e0282 5,0 id 30290 sts 0 0
6426 ex punlock 0
6426 en plock 7,19e0282
6426 lk 11,19e0282 id 30290 0,5 4
5922 qc 11,19e0282 0,5 id 30290 sts 0 0
6426 req 7,19e0282 ex 0-3ba4 lkf 2000 wait 1
6426 lk 7,19e0282 id 0 -1,5 2000
6426 lk 11,19e0282 id 30290 5,0 4
5922 qc 11,19e0282 5,0 id 30290 sts 0 0
6428 en punlock 7,19e0280
6428 lk 11,19e0280 id 300ae 0,5 4
5922 qc 11,19e0280 0,5 id 300ae sts 0 0
6428 remove 7,19e0280
6428 un 7,19e0280 8a01c3 5 0
5922 qc 7,19e0280 5,5 id 8a01c3 sts -65538 0
6428 lk 11,19e0280 id 300ae 5,0 4
5922 qc 7,19e0282 -1,5 id 930297 sts 0 0
5922 qc 11,19e0280 5,0 id 300ae sts 0 0
6428 ex punlock 0
6428 en plock 7,19e0280
6428 lk 11,19e0280 id 300ae 0,5 4
5922 qc 11,19e0280 0,5 id 300ae sts 0 0
6428 req 7,19e0280 ex 2ed897-2edab9 lkf 2000 wait 1
6428 lk 7,19e0280 id 0 -1,5 2000
6428 lk 11,19e0280 id 300ae 5,0 4
5922 qc 7,19e0280 -1,5 id 8902e7 sts 0 0
5922 qc 11,19e0280 5,0 id 300ae sts 0 0
6428 ex plock 0
6426 ex plock 0
6428 en punlock 7,19e0280
6428 lk 11,19e0280 id 300ae 0,5 4
5922 qc 11,19e0280 0,5 id 300ae sts 0 0
6428 remove 7,19e0280
6428 un 7,19e0280 8902e7 5 0
5922 qc 7,19e0280 5,5 id 8902e7 sts -65538 0
6428 lk 11,19e0280 id 300ae 5,0 4
6456 en punlock 7,1a00274
5922 qc 11,19e0280 5,0 id 300ae sts 0 0
6456 lk 11,1a00274 id 20320 0,5 4
6428 ex punlock 0
5922 qc 11,1a00274 0,5 id 20320 sts 0 0
6443 en punlock 7,19e0273
6443 lk 11,19e0273 id 50274 0,5 4
6456 remove 7,1a00274
5922 qc 11,19e0273 0,5 id 50274 sts 0 0
6456 un 7,1a00274 83037b 5 0
5922 qc 7,1a00274 5,5 id 83037b sts -65538 0
6456 lk 11,1a00274 id 20320 5,0 4
5922 qc 11,1a00274 5,0 id 20320 sts 0 0
6456 ex punlock 0
6456 en plock 7,1a00274
6456 lk 11,1a00274 id 20320 0,5 4
5922 qc 11,1a00274 0,5 id 20320 sts 0 0
6456 req 7,1a00274 ex 6060-70da lkf 2000 wait 1
6456 lk 7,1a00274 id 0 -1,5 2000
6456 lk 11,1a00274 id 20320 5,0 4
5922 qc 11,1a00274 5,0 id 20320 sts 0 0
6443 remove 7,19e0273
6443 un 7,19e0273 8e02ec 5 0
5922 qc 7,19e0273 5,5 id 8e02ec sts -65538 0
6443 lk 11,19e0273 id 50274 5,0 4
5922 qc 11,19e0273 5,0 id 50274 sts 0 0
6443 ex punlock 0
5922 qc 7,1a00274 -1,5 id 98007d sts 0 0
6443 en plock 7,19e0273
6443 lk 11,19e0273 id 50274 0,5 4
5922 qc 11,19e0273 0,5 id 50274 sts 0 0
6443 req 7,19e0273 ex 0-1dee lkf 2000 wait 1
6443 lk 7,19e0273 id 0 -1,5 2000
6443 lk 11,19e0273 id 50274 5,0 4
6456 ex plock 0
6428 en plock 7,19e0280
6426 en punlock 7,19e0282
6415 en punlock 7,1a00270
6456 en punlock 7,1a00274
6116 un 2,1a10fdd 1e018e 5 0
6196 un 2,1aa03ce 5303a4 5 0
6156 un 2,1a04889 630244 5 0
6085 un 2,1a70551 48030e 5 0

[root@morph-05 ~]# cat /proc/cluster/lock_dlm/debug
lk 11,678a01d id 202b7 0,5 4
5907 qc 11,678a01d 0,5 id 202b7 sts 0 0
6410 req 7,678a01d ex 2ed490-2ed762 lkf 2000 wait 1
6410 lk 7,678a01d id 0 -1,5 2000
6410 lk 11,678a01d id 202b7 5,0 4
5907 qc 7,678a01d -1,5 id 4000a6 sts 0 0
5907 qc 11,678a01d 5,0 id 202b7 sts 0 0
6410 ex plock 0
6317 en punlock 7,678a009
6317 lk 11,678a009 id 1026e 0,5 4
5907 qc 11,678a009 0,5 id 1026e sts 0 0
6317 remove 7,678a009
6317 un 7,678a009 390318 5 0
5907 qc 7,678a009 5,5 id 390318 sts -65538 0
6317 lk 11,678a009 id 1026e 5,0 4
5907 qc 11,678a009 5,0 id 1026e sts 0 0
6317 ex punlock 0
6317 en plock 7,678a009
6317 lk 11,678a009 id 1026e 0,5 4
5907 qc 11,678a009 0,5 id 1026e sts 0 0
6317 req 7,678a009 ex 0-6628 lkf 2000 wait 1
6317 lk 7,678a009 id 0 -1,5 2000
6317 lk 11,678a009 id 1026e 5,0 4
5907 qc 11,678a009 5,0 id 1026e sts 0 0
6255 en punlock 7,678a006
6255 lk 11,678a006 id 30185 0,5 4
5907 qc 11,678a006 0,5 id 30185 sts 0 0
6255 remove 7,678a006
6255 un 7,678a006 2c026c 5 0
5907 qc 7,678a006 5,5 id 2c026c sts -65538 0
6255 lk 11,678a006 id 30185 5,0 4
5907 qc 11,678a006 5,0 id 30185 sts 0 0
6255 ex punlock 0
6255 en plock 7,678a006
6255 lk 11,678a006 id 30185 0,5 4
5907 qc 11,678a006 0,5 id 30185 sts 0 0
6255 req 7,678a006 ex 6ee8-7453 lkf 2000 wait 1
6255 lk 7,678a006 id 0 -1,5 2000
6255 lk 11,678a006 id 30185 5,0 4
5907 qc 7,678a006 -1,5 id 230345 sts 0 0
5907 qc 11,678a006 5,0 id 30185 sts 0 0
6255 ex plock 0
6357 en punlock 7,678a015
6357 lk 11,678a015 id 302d4 0,5 4
5907 qc 11,678a015 0,5 id 302d4 sts 0 0
6357 remove 7,678a015
6357 un 7,678a015 2e028d 5 0
5907 qc 7,678a015 5,5 id 2e028d sts -65538 0
6357 lk 11,678a015 id 302d4 5,0 4
5907 qc 11,678a015 5,0 id 302d4 sts 0 0
6357 ex punlock 0
6357 en plock 7,678a015
6357 lk 11,678a015 id 302d4 0,5 4
5907 qc 11,678a015 0,5 id 302d4 sts 0 0
6357 req 7,678a015 ex 72c3-74ea lkf 2000 wait 1
6357 lk 7,678a015 id 0 -1,5 2000
6357 lk 11,678a015 id 302d4 5,0 4
5907 qc 11,678a015 5,0 id 302d4 sts 0 0
6413 en punlock 7,678a00e
6413 lk 11,678a00e id 40180 0,5 4
5907 qc 11,678a00e 0,5 id 40180 sts 0 0
6413 remove 7,678a00e
6413 un 7,678a00e 420182 5 0
5907 qc 7,678a00e 5,5 id 420182 sts -65538 0
6413 lk 11,678a00e id 40180 5,0 4
5907 qc 11,678a00e 5,0 id 40180 sts 0 0
6413 ex punlock 0
6413 en plock 7,678a010
6413 lk 11,678a010 id 400c4 0,5 4
5907 qc 11,678a010 0,5 id 400c4 sts 0 0
6413 req 7,678a010 ex 0-7fffffffffffffff lkf 2000 wait 1
6413 lk 7,678a010 id 0 -1,5 2000
6413 lk 11,678a010 id 400c4 5,0 4
5907 qc 11,678a010 5,0 id 400c4 sts 0 0
6412 lk 11,678a00e id 40180 0,5 4
5907 qc 11,678a00e 0,5 id 40180 sts 0 0
6412 req 7,678a00e ex 0-7fffffffffffffff lkf 2000 wait 1
6412 lk 7,678a00e id 0 -1,5 2000
6412 lk 11,678a00e id 40180 5,0 4
5907 qc 11,678a00e 5,0 id 40180 sts 0 0
6410 en punlock 7,678a01d
6410 lk 11,678a01d id 202b7 0,5 4
5907 qc 11,678a01d 0,5 id 202b7 sts 0 0
6410 remove 7,678a01d
6410 un 7,678a01d 4000a6 5 0
5907 qc 7,678a01d 5,5 id 4000a6 sts -65538 0
6410 lk 11,678a01d id 202b7 5,0 4
5907 qc 11,678a01d 5,0 id 202b7 sts 0 0
6410 ex punlock 0
5907 qc 7,679a007 -1,5 id 290032 sts 0 0
5907 qc 7,678a009 -1,5 id 3603ca sts 0 0
5907 qc 7,678a015 -1,5 id 37018d sts 0 0
5907 qc 7,678a00e -1,5 id 3f01d4 sts 0 0
6412 ex plock 0
6317 ex plock 0
6357 ex plock 0
6356 ex plock 0
6410 en plock 7,678a01d
6410 lk 11,678a01d id 202b7 0,5 4
5907 qc 11,678a01d 0,5 id 202b7 sts 0 0
6410 req 7,678a01d ex 0-2292b5 lkf 2000 wait 1
6410 lk 7,678a01d id 0 -1,5 2000
6410 lk 11,678a01d id 202b7 5,0 4
5907 qc 7,678a01d -1,5 id 310385 sts 0 0
5907 qc 11,678a01d 5,0 id 202b7 sts 0 0
6410 ex plock 0
5907 qc 7,678a018 -1,5 id 3b0112 sts 0 0
5907 qc 7,678a011 -1,5 id 3a0043 sts 0 0
5907 qc 7,678a010 -1,5 id 3001d6 sts 0 0
6415 ex plock 0
6411 ex plock 0
6413 ex plock 0
6412 en punlock 7,678a00e
6356 en punlock 7,679a007
6255 en punlock 7,678a006
6317 en punlock 7,678a009
6411 en punlock 7,678a011
6357 en punlock 7,678a015
6413 en punlock 7,678a010
6415 en punlock 7,678a018
6410 en punlock 7,678a01d
6009 un 2,67aa392 15027f 5 0
6048 un 2,67ba2c2 180132 5 0
6087 un 2,67da870 1203c7 5 0
6126 un 2,679ae48 220177 5 0



Comment 12 Corey Marthaler 2005-03-14 21:09:25 UTC
more info:

[root@morph-01 ~]# cat /proc/cluster/dlm_debug
 0,1,0 ids 13,21,13
gfs0 move use event 21
gfs0 recover event 21
gfs0 remove node 4
gfs0 remove node 5
clvmd move flags 0,1,0 ids 4,21,4
clvmd move use event 21
clvmd recover event 21
clvmd remove node 4
clvmd remove node 5
gfs2 total nodes 3
gfs2 rebuild resource directory
gfs0 total nodes 3
gfs0 rebuild resource directory
gfs1 total nodes 3
gfs1 rebuild resource directory
clvmd total nodes 3
clvmd rebuild resource directory
clvmd rebuilt 1 resources
clvmd purge requests
clvmd purged 0 requests
gfs3 rebuilt 6411 resources
gfs3 purge requests
gfs3 purged 0 requests
gfs2 rebuilt 6559 resources
gfs2 purge requests
gfs2 purged 0 requests
gfs0 rebuilt 6638 resources
gfs0 purge requests
gfs0 purged 0 requests
gfs1 rebuilt 6698 resources
gfs1 purge requests
gfs1 purged 0 requests
clvmd mark waiting requests
clvmd marked 0 requests
clvmd purge locks of departed nodes
clvmd purged 0 locks
clvmd update remastered resources
clvmd updated 0 resources
clvmd rebuild locks
clvmd rebuilt 0 locks
clvmd recover event 21 done


[root@morph-02 ~]# cat /proc/cluster/dlm_debug
move flags 1,0,0 ids 81,81,81
gfs3 move flags 0,1,0 ids 81,83,81
gfs3 move use event 83
gfs3 recover event 83
gfs3 remove node 4
gfs3 remove node 5
gfs2 move flags 0,1,0 ids 79,83,79
gfs2 move use event 83
gfs2 recover event 83
gfs2 remove node 4
gfs2 remove node 5
gfs1 move flags 0,1,0 ids 77,83,77
gfs1 move use event 83
gfs1 recover event 83
gfs1 remove node 4
gfs1 remove node 5
gfs0 move flags 0,1,0 ids 75,83,75
gfs0 move use event 83
gfs0 recover event 83
gfs0 remove node 4
gfs0 remove node 5
clvmd move flags 0,1,0 ids 66,83,66
clvmd move use event 83
clvmd recover event 83
clvmd remove node 4
clvmd remove node 5
gfs3 total nodes 3
gfs3 rebuild resource directory
gfs2 total nodes 3
gfs2 rebuild resource directory
clvmd total nodes 3
clvmd rebuild resource directory
gfs0 total nodes 3
gfs0 rebuild resource directory
gfs1 total nodes 3
gfs1 rebuild resource directory
clvmd rebuilt 2 resources
clvmd purge requests
clvmd purged 0 requests
gfs3 rebuilt 6438 resources
gfs3 purge requests
gfs3 purged 0 requests


[root@morph-05 ~]# cat /proc/cluster/dlm_debug
 5
gfs0 move flags 0,1,0 ids 24,32,24
gfs0 move use event 32
gfs0 recover event 32
gfs0 remove node 4
gfs0 remove node 5
clvmd move flags 0,1,0 ids 15,32,15
clvmd move use event 32
clvmd recover event 32
clvmd remove node 4
clvmd remove node 5
gfs3 total nodes 3
gfs3 rebuild resource directory
gfs2 total nodes 3
gfs2 rebuild resource directory
gfs0 total nodes 3
clvmd total nodes 3
gfs0 rebuild resource directory
clvmd rebuild resource directory
gfs1 total nodes 3
gfs1 rebuild resource directory
clvmd rebuilt 1 resources
clvmd purge requests
clvmd purged 0 requests
gfs3 rebuilt 6327 resources
gfs3 purge requests
gfs3 purged 0 requests
gfs1 rebuilt 6695 resources
gfs1 purge requests
gfs1 purged 0 requests
gfs0 rebuilt 6705 resources
gfs0 purge requests
gfs0 purged 0 requests
clvmd mark waiting requests
clvmd marked 0 requests
clvmd purge locks of departed nodes
clvmd purged 0 locks
clvmd update remastered resources
clvmd updated 0 resources
clvmd rebuild locks
clvmd rebuilt 0 locks
clvmd recover event 32 done




[root@morph-01 ~]# ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
  PID WIDE-WCHAN-COLUMN CMD
    1 -                 init [3]
    2 migration_thread  [migration/0]
    3 ksoftirqd         [ksoftirqd/0]
    4 migration_thread  [migration/1]
    5 ksoftirqd         [ksoftirqd/1]
    6 migration_thread  [migration/2]
    7 ksoftirqd         [ksoftirqd/2]
    8 migration_thread  [migration/3]
    9 ksoftirqd         [ksoftirqd/3]
   10 worker_thread     [events/0]
   11 worker_thread     [events/1]
   12 worker_thread     [events/2]
   13 worker_thread     [events/3]
   14 worker_thread     [khelper]
   15 worker_thread     [kacpid]
   47 worker_thread     [kblockd/0]
   48 worker_thread     [kblockd/1]
   49 worker_thread     [kblockd/2]
   50 worker_thread     [kblockd/3]
   51 hub_thread        [khubd]
   60 pdflush           [pdflush]
   61 pdflush           [pdflush]
   63 worker_thread     [aio/0]
   64 worker_thread     [aio/1]
   65 worker_thread     [aio/2]
   66 worker_thread     [aio/3]
   62 kswapd            [kswapd0]
  139 serio_thread      [kseriod]
  208 -                 [scsi_eh_0]
  209 -                 [qla2300_0_dpc]
  234 worker_thread     [kmirrord/0]
  235 worker_thread     [kmirrord/1]
  236 worker_thread     [kmirrord/2]
  237 worker_thread     [kmirrord/3]
  250 kjournald         [kjournald]
 1071 -                 udevd
 1473 kjournald         [kjournald]
 1838 -                 syslogd -m 0
 1842 syslog            klogd -x
 1852 -                 irqbalance
 1862 -                 portmap
 1881 -                 rpc.statd
 1911 -                 rpc.idmapd
 1968 -                 /usr/sbin/smartd
 1977 -                 /usr/sbin/acpid
 1988 -                 cupsd
 2035 -                 /usr/sbin/sshd
 2075 -                 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2106 -                 sendmail: rejecting connections on daemon MTA:
load average: 52
 2118 pause             sendmail: Queue runner@01:00:00 for
/var/spool/clientmqueue
 2136 -                 gpm -m /dev/input/mice -t imps2
 2181 -                 crond
 2206 -                 xfs -droppriv -daemon
 2240 -                 /usr/sbin/atd
 2274 -                 dbus-daemon-1 --system
 2303 -                 cups-config-daemon
 2315 -                 hald
 2322 -                 /sbin/agetty ttyS0 115200 vt100
 2323 -                 /sbin/mingetty tty1
 2324 -                 /sbin/mingetty tty2
 2325 -                 /sbin/mingetty tty3
 2326 -                 /sbin/mingetty tty4
 2327 -                 /sbin/mingetty tty5
 2328 -                 /sbin/mingetty tty6
 3708 -                 ccsd
 5781 cluster_kthread   [cman_comms]
 5783 serviced          [cman_serviced]
 5782 membership_kthrea [cman_memb]
 5784 hello_kthread     [cman_hbeat]
 5843 rt_sigsuspend     fenced
 5904 -                 clvmd
 5905 dlm_astd          [dlm_astd]
 5906 dlm_recvd         [dlm_recvd]
 5907 dlm_sendd         [dlm_sendd]
 5908 dlm_recoverd      [dlm_recoverd]
 6022 dlm_wait_function [dlm_recoverd]
 6023 dlm_async         [lock_dlm1]
 6024 dlm_async         [lock_dlm2]
 6025 -                 [gfs_scand]
 6026 -                 [gfs_glockd]
 6027 -                 [gfs_recoverd]
 6028 -                 [gfs_logd]
 6029 glock_wait_intern [gfs_quotad]
 6030 -                 [gfs_inoded]
 6052 dlm_wait_function [dlm_recoverd]
 6053 dlm_async         [lock_dlm1]
 6054 dlm_async         [lock_dlm2]
 6064 -                 [gfs_scand]
 6065 -                 [gfs_glockd]
 6066 -                 [gfs_recoverd]
 6067 -                 [gfs_logd]
 6068 glock_wait_intern [gfs_quotad]
 6069 -                 [gfs_inoded]
 6091 dlm_wait_function [dlm_recoverd]
 6101 dlm_async         [lock_dlm1]
 6102 dlm_async         [lock_dlm2]
 6103 -                 [gfs_scand]
 6104 -                 [gfs_glockd]
 6105 -                 [gfs_recoverd]
 6106 -                 [gfs_logd]
 6107 glock_wait_intern [gfs_quotad]
 6108 -                 [gfs_inoded]
 6130 dlm_wait_function [dlm_recoverd]
 6140 dlm_async         [lock_dlm1]
 6141 dlm_async         [lock_dlm2]
 6142 -                 [gfs_scand]
 6143 -                 [gfs_glockd]
 6144 -                 [gfs_recoverd]
 6145 -                 [gfs_logd]
 6146 glock_wait_intern [gfs_quotad]
 6147 -                 [gfs_inoded]
 6233 -                 sshd: root@notty
 6235 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 6253 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 6257 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.6253 -
 6258 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6253 -l /tmp/revolver/6253/revolver_l
 6259 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6253 -l /tmp/revolver/6253/revolver_l
 6260 -                 growfiles -i 0 -N 500 -n 4 -b
 6261 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6253 -l /tmp/revolver/6253/revolver_l
 6262 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 6263 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6253 -l /tmp/revolver/6253/revolver_l
 6264 wait              genesis -n 500 -d 150 -p 4
 6265 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6253 -l /tmp/revolver/6253/revolver_l
 6266 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6267 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6253 -l /tmp/revolver/6253/revolver_l
 6268 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 6269 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 6270 wait_async        doio -avk
 6271 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 6272 dlm_lock_sync     doio -avk
 6273 -                 genesis -n 500 -d 150 -p 4
 6274 -                 genesis -n 500 -d 150 -p 4
 6275 glock_wait_intern genesis -n 500 -d 150 -p 4
 6276 -                 genesis -n 500 -d 150 -p 4
 6277 -                 sshd: root@notty
 6278 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6279 -                 growfiles -i 0 -N 500 -n 4 -b
 6280 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6281 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6282 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6283 wait_async        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6284 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6286 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 6304 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 6308 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.6304 -
 6309 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6304 -l /tmp/revolver/6304/revolver_l
 6310 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6304 -l /tmp/revolver/6304/revolver_l
 6311 -                 growfiles -i 0 -N 500 -n 4 -b
 6312 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6304 -l /tmp/revolver/6304/revolver_l
 6313 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 6314 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6304 -l /tmp/revolver/6304/revolver_l
 6315 wait              genesis -n 500 -d 150 -p 4
 6316 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6304 -l /tmp/revolver/6304/revolver_l
 6317 -                 growfiles -i 0 -N 500 -n 4 -b
 6318 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6319 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6304 -l /tmp/revolver/6304/revolver_l
 6321 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6320 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 6322 -                 genesis -n 500 -d 150 -p 4
 6323 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6324 glock_wait_intern genesis -n 500 -d 150 -p 4
 6325 glock_wait_intern genesis -n 500 -d 150 -p 4
 6326 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 6327 -                 genesis -n 500 -d 150 -p 4
 6328 dlm_lock_sync     doio -avk
 6329 wait_async        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6330 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 6331 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6332 wait_async        doio -avk
 6333 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6334 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6335 -                 sshd: root@notty
 6337 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 6355 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 6359 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.6355 -
 6360 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6355 -l /tmp/revolver/6355/revolver_l
 6361 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6355 -l /tmp/revolver/6355/revolver_l
 6362 -                 growfiles -i 0 -N 500 -n 4 -b
 6363 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6355 -l /tmp/revolver/6355/revolver_l
 6364 wait              genesis -n 500 -d 150 -p 4
 6365 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6355 -l /tmp/revolver/6355/revolver_l
 6366 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 6367 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6355 -l /tmp/revolver/6355/revolver_l
 6368 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6369 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 6370 dlm_lock_sync     genesis -n 500 -d 150 -p 4
 6371 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6355 -l /tmp/revolver/6355/revolver_l
 6372 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6373 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6374 -                 genesis -n 500 -d 150 -p 4
 6375 -                 growfiles -i 0 -N 500 -n 4 -b
 6376 -                 genesis -n 500 -d 150 -p 4
 6377 glock_wait_intern genesis -n 500 -d 150 -p 4
 6378 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 6379 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6380 wait_async        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6381 dlm_lock_sync     doio -avk
 6382 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 6383 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6384 dlm_lock_sync     doio -avk
 6385 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6386 -                 sshd: root@notty
 6388 wait              bash -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/c
 6406 pipe_wait         /usr/bin/perl -w
/tmp/STS/gfs/bin/revolver_load_gen -r /tmp/STS -L HEAVY -m LOCK_DLM
 6410 wait              sh -c PATH=$PATH:/tmp/STS/bin;
/tmp/STS/bin/pan2 -x 5 -f /tmp/revolver_load_gen.6406 -
 6411 -                 /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6406 -l /tmp/revolver/6406/revolver_l
 6412 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6406 -l /tmp/revolver/6406/revolver_l
 6413 wait              sh -c iogen -f sync -m sequential -s
read,write,readv,writev -t 1b -T 30000 30000:rwsy
 6414 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6406 -l /tmp/revolver/6406/revolver_l
 6415 wait              accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6416 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6406 -l /tmp/revolver/6406/revolver_l
 6418 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6406 -l /tmp/revolver/6406/revolver_l
 6417 wait              genesis -n 500 -d 150 -p 4
 6419 pipe_wait         /tmp/STS/bin/pan2 -x 5 -f
/tmp/revolver_load_gen.6406 -l /tmp/revolver/6406/revolver_l
 6420 wait              sh -c iogen -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 6000b 6000b:
 6421 -                 growfiles -i 0 -N 500 -n 4 -b
 6422 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6423 pipe_wait         iogen -f sync -m sequential -s read write
readv writev -t 1b -T 30000 30000:rwsynclarg
 6424 glock_wait_intern genesis -n 500 -d 150 -p 4
 6425 dlm_lock_sync     doio -avk
 6426 dlm_lock_sync     genesis -n 500 -d 150 -p 4
 6427 -                 genesis -n 500 -d 150 -p 4
 6428 glock_wait_intern genesis -n 500 -d 150 -p 4
 6429 pipe_wait         iogen -f buffered -m sequential -s read write
readv writev -t 1b -T 6000b 6000b:rwbufl
 6430 dlm_lock_sync     doio -avk
 6431 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6432 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6433 glock_wait_intern growfiles -i 0 -N 500 -n 4 -b
 6434 -                 growfiles -i 0 -N 500 -n 4 -b
 6435 wait_async        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6436 wait_local        accordion -p 4 accrdfile1 accrdfile2
accrdfile3 accrdfile4 accrdfile5
 6990 -                 sshd: root@pts/0
 6992 wait              -bash
 7038 -                 ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd



Comment 13 Corey Marthaler 2005-03-14 21:17:26 UTC
[root@morph-01 ~]# cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)

Lock operations:      37236
Unlock operations:    21084
Convert operations:   78110
Completion ASTs:     136412
Blocking ASTs:            4

Lockqueue        num  waittime   ave
WAIT_RSB       26397    170661     6
WAIT_GRANT      8443      5663     0
WAIT_UNLOCK       44        86     1
Total          34884    176410     5


[root@morph-02 ~]# cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)

Lock operations:     521649
Unlock operations:   483282
Convert operations: 1594283
Completion ASTs:    2599179
Blocking ASTs:           99

Lockqueue        num  waittime   ave
WAIT_RSB      343301   1773399     5
WAIT_CONV         11        58     5
WAIT_GRANT      8624      2673     0
WAIT_UNLOCK      190        96     0
Total         352126   1776226     5

root@morph-05 ~]# cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)

Lock operations:     185748
Unlock operations:   144044
Convert operations:  462750
Completion ASTs:     792510
Blocking ASTs:           51

Lockqueue        num  waittime   ave
WAIT_RSB      134880    980725     7
WAIT_CONV          5        52    10
WAIT_GRANT      8585     16370     1
WAIT_UNLOCK      184      1267     6
Total         143654    998414     6



Comment 14 David Teigland 2005-03-22 15:27:38 UTC
copied from email:

I don't suppose those nodes are still stuck there?  It would be helpful to
get the ps from the other two nodes, as well as /proc/meminfo and
/proc/slabinfo from all.

There's actually nothing wrong on morph-01.  It's morph-02 and 05 where
dlm recovery has stalled in the rebuild resource directory stage.  That
stage can eat up a lot of memory and in the past I've seen nodes run out
of memory completely there.  There are usually other indications that the
system memory is gone, though, which is why I'm interested in
meminfo/slabinfo.

If it is in fact a memory problem, the solution isn't too clear...

With the load you're running multiplied by 4 fs's, the situation seems
ripe for running memory dry.  It's really high numbers of locks in the
dlm, combined with recovery that can lead to this.

/proc/cluster/lock_dlm/drop_count is one crude method we have for trying
to keep the number of dlm/gfs locks down to try to avoid running out of
memory during recovery.  By default I think it's set to 50000, and the
only way to find a better number is trial and error.  Lowering it makes
out-of-memory during recovery less likely, but can limit gfs caching and
hurt performance.


Comment 15 Corey Marthaler 2005-04-05 20:34:21 UTC
I reproduced and gathered all requested info and put it in:
/home/msp/cmarthal/pub/bugs/145683

Memory is low but there isn't a OOM case. Plenty of swap as well.
There were 3 filesystems and only one is hung, the other two I can contiune to
write/read to/from.

Comment 16 David Teigland 2005-04-06 02:29:06 UTC
It's back to being gfs/lock_dlm recovery that's stuck, not the dlm.
We can't see what happened in lock_dlm because logging from the other
running fs's has wiped out info from the stuck fs.

stuck lock_dlm: comments 1, 15, 8
stuck dlm: comments 5, 11

We eventually decided that comment 8 was a different bug (bz 152451)
since it completed after a long i/o delay.  Might comments 1 and 15 be
in the same category?  I don't know.

Kdb would probably give a quick and certain answer to what's going on
in both cases (stuck lock_dlm and dlm).  Collecting data as we've been
doing might work after a while if just the right clue pops out that can
be pieced together to say what's happening, but there's no telling how
many repetitions that might take.


Comment 17 Corey Marthaler 2005-04-13 19:49:16 UTC
Moving to need info.

After 17 hours of being stuck, a bad block was finally reported, which finally
caused recovery to finish. Thus, this may be 152451. I'll test this on the
"known good" MSA1000 storage to see if this can be reproduced.

Comment 18 Corey Marthaler 2005-04-20 14:55:39 UTC
blaming this on "cheap" storage, will reopen if ever seen on "good" storage.


Note You need to log in before you can comment on or make changes to this bug.