Bug 174531 - dlm recovery didn't work--bad accounting or message handling
dlm recovery didn't work--bad accounting or message handling
Status: CLOSED WORKSFORME
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-29 14:47 EST by Corey Marthaler
Modified: 2009-04-16 16:30 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-11-06 14:47:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2005-11-29 14:47:29 EST
Description of problem:
Recovery is hung on my four node x86_64 taft cluster. Taft-03 was shot and then
fenced by taft-01 but it's join attempt is stuck due to recovery being hung on
the remaining nodes. I've left the machines in this state for 23 hours now. This
appears very similar to bz145683 which was eventually closed/blamed on cheap
storage. The taft cluster however is using the "good" winchester storage.

 [root@taft-03 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    4   M   taft-03
   2    1    4   M   taft-01
   3    1    4   M   taft-04
   4    1    4   M   taft-02


[root@taft-01 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       U-1,10,1
[3 2 4]

DLM Lock Space:  "clvmd"                             2   3 recover 4 -
[3 2 4]

DLM Lock Space:  "taft0"                             3   4 recover 4 -
[3 2 4]

DLM Lock Space:  "taft1"                             5   6 recover 4 -
[3 2 4]

GFS Mount Group: "taft0"                             4   5 recover 0 -
[3 2 4]

GFS Mount Group: "taft1"                             6   7 recover 0 -
[3 2 4]


[root@taft-02 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       U-1,10,1
[2 3 4]

DLM Lock Space:  "clvmd"                             2   3 recover 2 -
[2 3 4]

DLM Lock Space:  "taft0"                             3   4 recover 4 -
[2 3 4]

DLM Lock Space:  "taft1"                             5   6 recover 4 -
[2 3 4]

GFS Mount Group: "taft0"                             4   5 recover 0 -
[2 3 4]

GFS Mount Group: "taft1"                             6   7 recover 0 -
[2 3 4]


[root@taft-03 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 join      S-4,4,1
[3 2 4 1]


[root@taft-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       U-1,10,1
[3 2 4]

DLM Lock Space:  "clvmd"                             2   3 recover 4 -
[3 2 4]

DLM Lock Space:  "taft0"                             3   4 recover 2 -
[3 2 4]

DLM Lock Space:  "taft1"                             5   6 recover 2 -
[3 2 4]

GFS Mount Group: "taft0"                             4   5 recover 0 -
[3 2 4]

GFS Mount Group: "taft1"                             6   7 recover 0 -
[3 2 4]


LOGS:
Nov 28 15:09:42 taft-01 kernel: CMAN: node taft-03 has been removed from the
cluster : Missed too many heartbeats
Nov 28 15:10:13 taft-01 fenced[3759]: taft-03 not a cluster member after 30 sec
post_fail_delay
Nov 28 15:10:13 taft-01 fenced[3759]: fencing node "taft-03"
Nov 28 15:10:13 taft-01 fenced[3759]: fence "taft-03" success
Nov 28 15:12:02 taft-01 kernel: CMAN: node taft-03 rejoining


Nov 28 15:10:58 taft-02 kernel: CMAN: node taft-03 has been removed from the
cluster : Missed too many heartbeats
Nov 28 15:10:59 taft-02 fenced[3997]: fencing deferred to taft-01
Nov 28 15:13:18 taft-02 kernel: CMAN: node taft-03 rejoining


Nov 28 15:09:44 taft-04 kernel: CMAN: removing node taft-03 from the cluster :
Missed too many heartbeats
Nov 28 15:09:45 taft-04 fenced[3508]: fencing deferred to taft-01
Nov 28 15:12:04 taft-04 kernel: CMAN: node taft-03 rejoining

The only suspious messages possibly relating to I/O are these:
Nov 28 15:10:33 taft-02 kernel: qla2300 0000:0a:02.0: scsi(1:0:0:1): Queue depth
adjusted to -1.

but the queue depth is still at 64 according to proc:
[root@taft-02 ~]# cat /sys/block/sda/device/queue_depth
64


[root@taft-01 ~]#  cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)

Lock operations:    18905030
Unlock operations:  18764410
Convert operations: 44704613
Completion ASTs:    82376637
Blocking ASTs:        21806

Lockqueue        num  waittime   ave
WAIT_RSB      13833766   260064409    18
WAIT_CONV      35083     58832     1
WAIT_GRANT    148038   1357422     9
WAIT_UNLOCK    43212    584744    13
Total         14060099   262065407    18


[root@taft-02 ~]#  cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)

Lock operations:      26342
Unlock operations:     5199
Convert operations:   20175
Completion ASTs:      51709
Blocking ASTs:            5

Lockqueue        num  waittime   ave
WAIT_RSB        7186     90434    12
WAIT_CONV          2         0     0
WAIT_GRANT     17858      8659     0
WAIT_UNLOCK       36       406    11
Total          25082     99499     3


[root@taft-03 ~]#  cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)

Lock operations:          0
Unlock operations:        0
Convert operations:       0
Completion ASTs:          0
Blocking ASTs:            0

Lockqueue        num  waittime   ave


[root@taft-04 ~]#  cat /proc/cluster/dlm_stats
DLM stats (HZ=1000)

Lock operations:    50511724
Unlock operations:  8699484
Convert operations: 21196276
Completion ASTs:    80408627
Blocking ASTs:        24687

Lockqueue        num  waittime   ave
WAIT_RSB      6318725   99897200    15
WAIT_CONV      10763     44975     4
WAIT_GRANT    41821869   21734846     0
WAIT_UNLOCK    37358    232860     6
Total         48188715   121909881     2



Version-Release number of selected component (if applicable):
DLM 2.6.9-39.0 (built Nov 14 2005 17:38:14) installed
Lock_Harness 2.6.9-44.0 (built Nov 17 2005 15:43:18) installed
GFS 2.6.9-44.0 (built Nov 17 2005 15:43:35) installed
Lock_Nolock 2.6.9-44.0 (built Nov 17 2005 15:43:19)

How reproducible:
Only once so far
Comment 1 Corey Marthaler 2005-11-29 14:49:39 EST
more debug info:

[root@taft-01 ~]# cat /proc/cluster/lock_dlm/debug
 plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 45136ee8-4691641f lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 46916420-46ab8700 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 46ab8701-46bcc087 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 46e9a7c7-46f8bfab lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 46f8bfac-46fe4fa8 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 46fe4fa8-47076a99 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 47076a9a-4718b65a lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 4718b65a-47223967 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 47223967-4722a9a7 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 4722a9a7-4722ef59 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 4724e129-4724f49e lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 4724f49f-4725101e lkf 2000 wait 1
6585 en punlock 7,38
6585 remove 7,38
6585 ex punlock 0
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 0-a713ce2 lkf 2000 wait 1
6589 ex plock 0
6585 en plock 7,38
6585 req 7,38 ex 41b48b0a-49e24f0e lkf 2000 wait 1
6585 ex plock 0
6585 en punlock 7,38
6585 remove 7,38
6585 ex punlock 0
6585 en plock 7,38
6585 req 7,38 ex 4a32afea-4b9e61c1 lkf 2000 wait 1
6585 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6585 en punlock 7,38
6585 remove 7,38
6585 ex punlock 0
6585 en plock 7,38
6585 req 7,38 ex 4b9e61c1-4bcfec8e lkf 2000 wait 1
6585 ex plock 0
6585 en punlock 7,38
6585 remove 7,38
6585 ex punlock 0
6585 en plock 7,38
6585 req 7,38 ex 4bcfec8f-4bd124c3 lkf 2000 wait 1
6585 ex plock 0
6585 en punlock 7,38
6585 remove 7,38
6585 ex punlock 0
6585 en plock 7,38
6585 req 7,38 ex 4bdf76e6-4be17591 lkf 2000 wait 1
6585 ex plock 0
6585 en punlock 7,38
6585 remove 7,38
6585 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 42101a3b-4435a61c lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 4435a61d-472389d0 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 4724b357-472500e8 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 472500e9-4725062e lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 4725062f-472511d0 lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6589 en plock 7,1131326
6589 req 7,1131326 ex 472511d0-4725170a lkf 2000 wait 1
6589 ex plock 0
6589 en punlock 7,1131326
6589 remove 7,1131326
6589 ex punlock 0
6585 en plock 7,38
6585 req 7,38 ex 43e0c2c5-46552550 lkf 2000 wait 1
6585 ex plock 0
6585 en punlock 7,38
6585 remove 7,38
6585 ex punlock 0
6585 en plock 7,38
6589 en plock 7,1131326



[root@taft-02 2.6.9-22.EL-x86_64]# cat /proc/cluster/lock_dlm/debug
 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 45396710-470a291e lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 470a291f-470b27e6 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 470ff90f-471af2a3 lkf 2000 wait 1
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 471af2a3-471cb7a1 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 471cb7a1-4724f822 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 4724f822-47250bcd lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 47250bce-4725103e lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 43d9b2f9-4842b7a6 lkf 2000 wait 1
6203 ex plock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 4842b7a6-48f07621 lkf 2000 wait 1
6203 ex plock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 4a7c5d85-4b8cd2ce lkf 2000 wait 1
6203 ex plock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 4be27edc-4be64dee lkf 2000 wait 1
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 23175dce-3857436e lkf 2000 wait 1
6203 ex plock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 4be7f58e-4be8cb32 lkf 2000 wait 1
6249 ex plock 0
6203 ex plock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 4be90af6-4be939ed lkf 2000 wait 1
6203 ex plock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 4be939ed-4be94bfb lkf 2000 wait 1
6203 ex plock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 0-2903368e lkf 2000 wait 1
6203 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 42efe459-46e60834 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 46e60835-47084344 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 47084344-47088246 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 4721da5a-47236c0a lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 47236c0a-4723cb6c lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 4724ee07-4724f5c4 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 4724f5c4-4725024a lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6249 en plock 7,10fd4d0
6249 req 7,10fd4d0 ex 47250b79-47251519 lkf 2000 wait 1
6249 ex plock 0
6249 en punlock 7,10fd4d0
6249 remove 7,10fd4d0
6249 ex punlock 0
6203 en punlock 7,10fd4d9
6203 remove 7,10fd4d9
6203 ex punlock 0
6203 en plock 7,10fd4d9
6203 req 7,10fd4d9 ex 347e0420-3dbcdc91 lkf 2000 wait 1
6203 ex plock 0
6203 en punlock 7,10fd4d9
6249 en plock 7,10fd4d0



[root@taft-04 ~]# cat /proc/cluster/lock_dlm/debug
07 en plock 7,330fcd2
9308 req 7,330fcd2 ex 0-7fffffffffffffff lkf 2000 wait 1
9308 ex plock 0
9292 ex plock 0
9310 ex plock 0
9285 ex plock 0
9308 en punlock 7,330fcd2
9308 remove 7,330fcd2
9308 ex punlock 0
9308 en plock 7,330fcd1
9307 req 7,330fcd2 ex 0-7fffffffffffffff lkf 2000 wait 1
9308 req 7,330fcd1 ex 0-7fffffffffffffff lkf 2000 wait 1
9307 ex plock 0
9308 ex plock 0
9310 en punlock 7,330fcd3
9310 remove 7,330fcd3
9310 ex punlock 0
9310 en plock 7,330fcd2
9306 req 7,330fcd3 ex 0-7fffffffffffffff lkf 2000 wait 1
9306 ex plock 0
9307 en punlock 7,330fcd2
9307 remove 7,330fcd2
9307 ex punlock 0
9307 en plock 7,330fcd1
9310 req 7,330fcd2 ex 0-7fffffffffffffff lkf 2000 wait 1
9310 ex plock 0
9308 en punlock 7,330fcd1
9308 remove 7,330fcd1
9308 ex punlock 0
9308 en plock 7,330fcd2
9307 req 7,330fcd1 ex 0-7fffffffffffffff lkf 2000 wait 1
9306 en punlock 7,330fcd3
9306 remove 7,330fcd3
9306 ex punlock 0
9306 en plock 7,330fcd2
9310 en punlock 7,330fcd2
9310 remove 7,330fcd2
9310 ex punlock 0
9310 en plock 7,330fcd5
9308 req 7,330fcd2 ex 0-7fffffffffffffff lkf 2000 wait 1
9310 req 7,330fcd5 ex 0-7fffffffffffffff lkf 2000 wait 1
9308 ex plock 0
9310 ex plock 0
9308 en punlock 7,330fcd2
9308 remove 7,330fcd2
9308 ex punlock 0
9308 en plock 7,330fcd2
9306 req 7,330fcd2 ex 0-7fffffffffffffff lkf 2000 wait 1
9306 ex plock 0
9310 en punlock 7,330fcd5
9310 remove 7,330fcd5
9310 ex punlock 0
9310 en plock 7,330fcd3
9310 req 7,330fcd3 ex 0-7fffffffffffffff lkf 2000 wait 1
9307 ex plock 0
9306 en punlock 7,330fcd2
9306 remove 7,330fcd2
9306 ex punlock 0
9306 en plock 7,330fcd3
9308 req 7,330fcd2 ex 0-7fffffffffffffff lkf 2000 wait 1
9308 ex plock 0
9310 ex plock 0
9307 en punlock 7,330fcd1
9307 remove 7,330fcd1
9307 ex punlock 0
9307 en plock 7,330fcd1
9307 req 7,330fcd1 ex 0-7fffffffffffffff lkf 2000 wait 1
9308 en punlock 7,330fcd2
9308 remove 7,330fcd2
9308 ex punlock 0
9308 en plock 7,330fcd5
9308 req 7,330fcd5 ex 0-7fffffffffffffff lkf 2000 wait 1
9308 ex plock 0
9310 en punlock 7,330fcd3
9310 remove 7,330fcd3
9310 ex punlock 0
9310 en plock 7,330fcd4
9306 req 7,330fcd3 ex 0-7fffffffffffffff lkf 2000 wait 1
9310 req 7,330fcd4 ex 0-7fffffffffffffff lkf 2000 wait 1
9308 en punlock 7,330fcd5
9308 remove 7,330fcd5
9308 ex punlock 0
9308 en plock 7,330fcd3
9276 en punlock 7,34ef7e3
9276 remove 7,34ef7e3
9276 ex punlock 0
9276 en plock 7,34ef7ed
9276 req 7,34ef7ed ex 0-7fffffffffffffff lkf 2000 wait 1
9276 ex plock 0
9307 ex plock 0
9285 en punlock 7,34ef7ee
9292 en punlock 7,34ef7e4
9292 remove 7,34ef7e4
9276 en punlock 7,34ef7ed
9285 remove 7,34ef7ee
9292 ex punlock 0
9276 remove 7,34ef7ed
9292 en plock 7,34ef7e3
9292 req 7,34ef7e3 ex 0-7fffffffffffffff lkf 2000 wait 1
9285 ex punlock 0
9285 en plock 7,34ef7ee
9276 ex punlock 0
9276 en plock 7,34ef7ee
9285 req 7,34ef7ee ex 0-7fffffffffffffff lkf 2000 wait 1
9307 en punlock 7,330fcd1
9307 remove 7,330fcd1
9307 ex punlock 0
9307 en plock 7,330fcd4
9310 ex plock 0
9292 ex plock 0
9306 ex plock 0
9285 ex plock 0
9310 en punlock 7,330fcd4
9310 remove 7,330fcd4
9310 ex punlock 0
9310 en plock 7,330fcd3
9307 req 7,330fcd4 ex 0-7fffffffffffffff lkf 2000 wait 1
9292 en punlock 7,34ef7e3
9292 remove 7,34ef7e3
9292 ex punlock 0
9292 en plock 7,34ef7ee
9285 en punlock 7,34ef7ee
9285 remove 7,34ef7ee
9285 ex punlock 0
9285 en plock 7,34ef7e3
9285 req 7,34ef7e3 ex 0-7fffffffffffffff lkf 2000 wait 1
9276 req 7,34ef7ee ex 0-7fffffffffffffff lkf 2000 wait 1
9306 en punlock 7,330fcd3
9306 remove 7,330fcd3
9306 ex punlock 0
9310 req 7,330fcd3 ex 0-7fffffffffffffff lkf 2000 wait 1
9306 en plock 7,330fcd4
9276 ex plock 0
9285 ex plock 0
9307 ex plock 0
9310 ex plock 0
9276 en punlock 7,34ef7ee
9276 remove 7,34ef7ee
9276 ex punlock 0
9276 en plock 7,34ef7ed
9276 req 7,34ef7ed ex 0-7fffffffffffffff lkf 2000 wait 1
9276 ex plock 0
9281 req 7,34ef7ee ex 0-7fffffffffffffff lkf 2000 wait 1
9285 en punlock 7,34ef7e3
9285 remove 7,34ef7e3
9285 ex punlock 0
9285 en plock 7,34ef7ee
9276 en punlock 7,34ef7ed
9307 en punlock 7,330fcd4
9310 en punlock 7,330fcd3
9293 en punlock 7,3314504
9317 en punlock 7,3314195
Comment 2 Corey Marthaler 2005-11-29 14:51:20 EST
more:

[root@taft-01 ~]# ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
  PID WIDE-WCHAN-COLUMN CMD
    1 -                 init [5]
    2 migration_thread  [migration/0]
    3 ksoftirqd         [ksoftirqd/0]
    4 migration_thread  [migration/1]
    5 ksoftirqd         [ksoftirqd/1]
    6 migration_thread  [migration/2]
    7 ksoftirqd         [ksoftirqd/2]
    8 migration_thread  [migration/3]
    9 ksoftirqd         [ksoftirqd/3]
   10 worker_thread     [events/0]
   11 worker_thread     [events/1]
   12 worker_thread     [events/2]
   13 worker_thread     [events/3]
   14 worker_thread     [khelper]
   15 worker_thread     [kacpid]
   51 worker_thread     [kblockd/0]
   52 worker_thread     [kblockd/1]
   53 worker_thread     [kblockd/2]
   54 worker_thread     [kblockd/3]
   72 worker_thread     [aio/0]
   73 worker_thread     [aio/1]
   74 worker_thread     [aio/2]
   75 worker_thread     [aio/3]
   55 hub_thread        [khubd]
   71 kswapd            [kswapd0]
  148 serio_thread      [kseriod]
  212 16045567552327254 [scsi_eh_0]
  228 worker_thread     [kmirrord]
  229 worker_thread     [kmir_mon]
  237 kjournald         [kjournald]
 1017 16045567552327254 [scsi_eh_1]
 1018 16045567552327254 [qla2300_1_dpc]
 1030 -                 udevd
 1307 -                 [shpchpd_event]
 1705 kauditd_thread    [kauditd]
 1888 kjournald         [kjournald]
 2353 -                 syslogd -m 0
 2357 syslog            klogd -x
 2368 -                 irqbalance
 2379 -                 portmap
 2399 -                 rpc.statd
 2427 -                 rpc.idmapd
 2538 -                 /usr/sbin/acpid
 2589 -                 /usr/sbin/sshd
 2620 -                 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2640 -                 sendmail: accepting connections
 2648 pause             sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
 2669 -                 gpm -m /dev/input/mice -t imps2
 2729 -                 crond
 2755 -                 xfs -droppriv -daemon
 2774 -                 /usr/sbin/atd
 2799 -                 dbus-daemon-1 --system
 2812 -                 cups-config-daemon
 2823 -                 hald
 2860 wait              login -- root
 2861 -                 /sbin/mingetty tty1
 2862 -                 /sbin/mingetty tty2
 2863 -                 /sbin/mingetty tty3
 2867 -                 /sbin/mingetty tty4
 2868 -                 /sbin/mingetty tty5
 2870 -                 /sbin/mingetty tty6
 2944 -                 /usr/bin/gdm-binary -nodaemon
 3410 pipe_wait         /usr/bin/gdm-binary -nodaemon
 3411 -                 /usr/X11R6/bin/X :0 -audit 0 -auth /var/gdm/:0.Xauth
-nolisten tcp
 3663 -                 /usr/bin/gdmgreeter
 3664 -                 sshd: root@pts/0
 3666 wait              -bash
 3705 -                 ccsd
 3730 cluster_kthread   [cman_comms]
 3732 serviced          [cman_serviced]
 3731 membership_kthrea [cman_memb]
 3733 hello_kthread     [cman_hbeat]
 3759 rt_sigsuspend     fenced -w
 3772 -                 clvmd
 3773 dlm_astd          [dlm_astd]
 3774 dlm_recvd         [dlm_recvd]
 3775 dlm_sendd         [dlm_sendd]
 3776 dlm_recoverd      [dlm_recoverd]
 3828 dlm_recoverd      [dlm_recoverd]
 3829 dlm_async         [lock_dlm1]
 3830 dlm_async         [lock_dlm2]
 3831 -                 [gfs_scand]
 3832 -                 [gfs_glockd]
 3833 -                 [gfs_recoverd]
 3834 -                 [gfs_logd]
 3835 glock_wait_intern [gfs_quotad]
 3836 glock_wait_intern [gfs_inoded]
 3840 dlm_recoverd      [dlm_recoverd]
 3841 dlm_async         [lock_dlm1]
 3842 dlm_async         [lock_dlm2]
 3843 -                 [gfs_scand]
 3844 -                 [gfs_glockd]
 3845 -                 [gfs_recoverd]
 3846 -                 [gfs_logd]
 3847 -                 [gfs_quotad]
 3848 glock_wait_intern [gfs_inoded]
 3672 -                 cupsd
 6524 -                 qarshd
 6525 -                 qarshd
 6526 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 6527 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 6529 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 12437316 -r /usr/tests/sts -L HEAVY
 6531 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 11656384 -r /usr/tests/sts -L HEAVY
 6538 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.6529 -a /tmp/revolver_l
 6539 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.6531 -a /tmp/revolver_l
 6540 -                 pan2 -x 5 -f /tmp/revolver_load_gen.6529 -a
/tmp/revolver_load_gen.6529.active -d /tmp/revolver_
 6541 -                 pan2 -x 5 -f /tmp/revolver_load_gen.6531 -a
/tmp/revolver_load_gen.6531.active -d /tmp/revolver_
 6544 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6529 -a
/tmp/revolver_load_gen.6529.active -d /tmp/revolver_
 6545 wait              genesis -S 12723 -n 500 -d 150 -p 4
 6546 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6531 -a
/tmp/revolver_load_gen.6531.active -d /tmp/revolver_
 6547 glock_wait_intern growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6548 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6529 -a
/tmp/revolver_load_gen.6529.active -d /tmp/revolver_
 6549 wait              accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6550 dlm_lock_sync     genesis -S 12723 -n 500 -d 150 -p 4
 6551 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6531 -a
/tmp/revolver_load_gen.6531.active -d /tmp/revolver_
 6553 wait_async        genesis -S 12723 -n 500 -d 150 -p 4
 6552 wait              accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6554 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6529 -a
/tmp/revolver_load_gen.6529.active -d /tmp/revolver_
 6557 glock_wait_intern genesis -S 12723 -n 500 -d 150 -p 4
 6558 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6529 -a
/tmp/revolver_load_gen.6529.active -d /tmp/revolver_
 6559 -                 genesis -S 12723 -n 500 -d 150 -p 4
 6560 lm_dlm_plock      accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6561 glock_wait_intern growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6562 -                 growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6563 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6531 -a
/tmp/revolver_load_gen.6531.active -d /tmp/revolver_
 6564 wait              genesis -S 12723 -n 500 -d 150 -p 4
 6556 wait              sh -c iogen -S 12723 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2487463b 2487
 6565 -                 growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6566 -                 growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6568 lm_dlm_plock      accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6569 glock_wait_intern growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6570 lm_dlm_plock      accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6571 -                 growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6572 wait_async        accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6573 pipe_wait         iogen -S 12723 -f buffered -m sequential -s read write
readv writev -t 1b -T 2487463b 2487463b:r
 6576 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.6531 -a
/tmp/revolver_load_gen.6531.active -d /tmp/revolver_
 6577 wait              sh -c iogen -S 12723 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2331276b 2331
 6578 glock_wait_intern growfiles -S 12723 -i 0 -N 500 -n 4 -b
 6580 dlm_lock_sync     genesis -S 12723 -n 500 -d 150 -p 4
 6582 wait_async        genesis -S 12723 -n 500 -d 150 -p 4
 6583 wait_async        genesis -S 12723 -n 500 -d 150 -p 4
 6584 lm_dlm_plock      genesis -S 12723 -n 500 -d 150 -p 4
 6585 dlm_lock_sync     doio -avk
 6586 wait_async        accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6588 wait_async        accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6587 pipe_wait         iogen -S 12723 -f buffered -m sequential -s read write
readv writev -t 1b -T 2331276b 2331276b:r
 6589 dlm_lock_sync     doio -avk
 6590 lm_dlm_plock      accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 6591 lm_dlm_plock      accordion -S 12723 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 8376 pdflush           [pdflush]
 8384 pdflush           [pdflush]
11174 wait              -bash
11207 -                 tail -f /var/log/messages
26921 pipe_wait         crond
26922 wait              /bin/bash /usr/bin/run-parts /etc/cron.daily
26923 pipe_wait         /usr/bin/perl -w /etc/cron.daily/00-logwatch
26924 pipe_wait         awk -v progname=/etc/cron.daily/00-logwatch progname
{?????   print progname ":\n"?????   progna
27158 pipe_wait         /bin/mail -s LogWatch for taft-01 root
27277 wait              /bin/bash /etc/log.d/scripts/services/zz-disk_space
27279 -                 df -hP
27280 pipe_wait         grep ^/dev/
27281 pipe_wait         sed s/\([^[:space:]]\{20,\}\)\(.*\)/\1\n               
    \2/
 8769 -                 ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
Comment 3 Corey Marthaler 2005-11-29 14:53:12 EST
even more: 

[root@taft-01 ~]# cat /proc/cluster/dlm_debug
d rebuilt 0 locks
clvmd recover event 28 done
taft1 mark waiting requests
taft1 mark 26dd02dd lq 1 nodeid -1
taft1 mark 26780399 lq 1 nodeid -1
taft1 mark 26a30197 lq 1 nodeid -1
taft1 mark 263500eb lq 1 nodeid -1
taft1 mark 263503ee lq 1 nodeid -1
taft1 mark 267b023a lq 1 nodeid -1
taft1 marked 6 requests
taft1 purge locks of departed nodes
taft0 mark waiting requests
taft0 mark 21a60124 lq 3 nodeid 1
taft0 mark 21c203cf lq 1 nodeid -1
taft0 mark 216c01bc lq 1 nodeid -1
taft0 mark 21fc00d5 lq 1 nodeid -1
taft0 mark 219b0067 lq 1 nodeid -1
taft0 mark 2170032b lq 1 nodeid -1
taft0 mark 22210220 lq 3 nodeid 1
taft0 mark 2130031b lq 1 nodeid -1
taft0 marked 8 requests
taft0 purge locks of departed nodes
taft0 purged 1 locks
taft0 update remastered resources
taft1 purged 1 locks
taft1 update remastered resources
taft0 updated 1102 resources
taft0 rebuild locks
taft1 updated 1100 resources
taft1 rebuild locks
taft0 rebuilt 2051 locks
taft0 recover event 28 done
taft1 rebuilt 1997 locks
taft1 recover event 28 done


[root@taft-02 ~]# cat /proc/cluster/dlm_debug
d resource directory
taft0 total nodes 3
taft0 rebuild resource directory
clvmd total nodes 3
clvmd rebuild resource directory
clvmd rebuilt 0 resources
clvmd purge requests
clvmd purged 0 requests
taft0 rebuilt 9885 resources
taft0 purge requests
taft0 purged 0 requests
taft1 rebuilt 11948 resources
taft1 purge requests
taft1 purged 0 requests
taft0 mark waiting requests
taft1 mark waiting requests
taft0 mark 1002d6 lq 1 nodeid -1
taft0 mark d00b2 lq 1 nodeid -1
taft0 mark d0360 lq 1 nodeid -1
taft0 mark 800d3 lq 1 nodeid -1
taft0 marked 4 requests
taft0 purge locks of departed nodes
taft1 mark 11030e lq 1 nodeid -1
taft1 mark c039b lq 1 nodeid -1
taft1 marked 2 requests
taft1 purge locks of departed nodes
taft0 purged 1 locks
taft0 update remastered resources
taft1 purged 1 locks
taft1 update remastered resources
taft0 updated 1103 resources
taft0 rebuild locks
taft0 rebuilt 2 locks
taft0 recover event 7 done
taft1 updated 1100 resources
taft1 rebuild locks
taft1 rebuilt 37 locks
taft1 recover event 7 done


[root@taft-04 ~]# cat /proc/cluster/dlm_debug
quests
taft1 purged 0 requests
clvmd mark waiting requests
clvmd marked 0 requests
clvmd purge locks of departed nodes
clvmd purged 0 locks
clvmd update remastered resources
clvmd updated 2 resources
clvmd rebuild locks
taft1 mark waiting requests
taft0 mark waiting requests
taft1 mark 828001c5 lq 1 nodeid -1
taft1 mark 80b7025c lq 1 nodeid -1
taft1 mark 827a03d3 lq 1 nodeid -1
taft1 mark 81ce0335 lq 1 nodeid -1
taft1 mark 83ca0207 lq 1 nodeid -1
taft1 mark 82300159 lq 1 nodeid -1
taft1 marked 6 requests
taft1 purge locks of departed nodes
taft0 mark 3e04021d lq 1 nodeid -1
taft0 mark 3e75033a lq 3 nodeid 1
taft0 mark 3e400208 lq 1 nodeid -1
taft0 mark 3df000b1 lq 1 nodeid -1
taft0 mark 3e4d03cd lq 1 nodeid -1
taft0 mark 3f0902c7 lq 1 nodeid -1
taft0 marked 6 requests
taft0 purge locks of departed nodes
clvmd rebuilt 2 locks
clvmd recover event 46 done
taft0 purged 2 locks
taft0 update remastered resources
taft1 purged 4 locks
taft1 update remastered resources
taft0 updated 1102 resources
taft0 rebuild locks

Comment 4 Corey Marthaler 2005-11-29 14:59:10 EST
one more  :)

[root@taft-01 ~]# cat /proc/meminfo
MemTotal:      8169156 kB
MemFree:       1798576 kB
Buffers:         44552 kB
Cached:        3164572 kB
SwapCached:          0 kB
Active:        5843884 kB
Inactive:       230056 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8169156 kB
LowFree:       1798576 kB
SwapTotal:     2031608 kB
SwapFree:      2031608 kB
Dirty:              60 kB
Writeback:           0 kB
Mapped:        2929164 kB
Slab:           268904 kB
Committed_AS:  3696092 kB
PageTables:       9656 kB
VmallocTotal: 536870911 kB
VmallocUsed:    264088 kB
VmallocChunk: 536606639 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

[root@taft-02 ~]# cat /proc/meminfo
MemTotal:      8169156 kB
MemFree:       2760112 kB
Buffers:         41328 kB
Cached:        2727564 kB
SwapCached:          0 kB
Active:        5115020 kB
Inactive:       155240 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8169156 kB
LowFree:       2760112 kB
SwapTotal:     2031608 kB
SwapFree:      2031608 kB
Dirty:              48 kB
Writeback:           0 kB
Mapped:        2568480 kB
Slab:           112744 kB
Committed_AS:  2669484 kB
PageTables:       8564 kB
VmallocTotal: 536870911 kB
VmallocUsed:    264088 kB
VmallocChunk: 536606639 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

[root@taft-03 ~]# cat /proc/meminfo
MemTotal:      8169152 kB
MemFree:       7832668 kB
Buffers:         42944 kB
Cached:         215728 kB
SwapCached:          0 kB
Active:         136596 kB
Inactive:       142428 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8169152 kB
LowFree:       7832668 kB
SwapTotal:     2031608 kB
SwapFree:      2031608 kB
Dirty:              88 kB
Writeback:           0 kB
Mapped:          36804 kB
Slab:            40608 kB
Committed_AS:   527004 kB
PageTables:       1792 kB
VmallocTotal: 536870911 kB
VmallocUsed:    263608 kB
VmallocChunk: 536606871 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

[root@taft-04 ~]# cat /proc/meminfo
MemTotal:      8169152 kB
MemFree:       2232432 kB
Buffers:         43264 kB
Cached:        2752352 kB
SwapCached:          0 kB
Active:        5563164 kB
Inactive:       124612 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      8169152 kB
LowFree:       2232432 kB
SwapTotal:     2031608 kB
SwapFree:      2031608 kB
Dirty:              48 kB
Writeback:           0 kB
Mapped:        2960076 kB
Slab:           221996 kB
Committed_AS:  3701572 kB
PageTables:       9416 kB
VmallocTotal: 536870911 kB
VmallocUsed:    264088 kB
VmallocChunk: 536606639 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB


[root@taft-01 ~]# cat /proc/slabinfo
slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
: tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs>
<num_slabs> <sharedavail>
gfs_meta_header_cache  16220  16856     80   49    1 : tunables  120   60    8 :
slabdata    344    344      0
gfs_bufdata         5524   8000    160   25    1 : tunables  120   60    8 :
slabdata    320    320      0
gfs_inode          31541  31563    536    7    1 : tunables   54   27    8 :
slabdata   4509   4509      0
gfs_glock          67064  67248    312   12    1 : tunables   54   27    8 :
slabdata   5604   5604      0
gfs_bio_wrapper     1024   1125     16  225    1 : tunables  120   60    8 :
slabdata      5      5      0
dlm_conn               6     40    192   20    1 : tunables  120   60    8 :
slabdata      2      2      0
dlm_lvb/range         82    476     32  119    1 : tunables  120   60    8 :
slabdata      4      4      0
dlm_resdir(s)      22942  34638     56   69    1 : tunables  120   60    8 :
slabdata    502    502      0
dlm_resdir(l)          0      0     88   45    1 : tunables  120   60    8 :
slabdata      0      0      0
dlm_lkb            39794  40273    232   17    1 : tunables  120   60    8 :
slabdata   2369   2369      0
dlm_rsb(large)         2     26    304   13    1 : tunables   54   27    8 :
slabdata      2      2      0
dlm_rsb(small)     37849  38346    272   14    1 : tunables   54   27    8 :
slabdata   2739   2739      0
cluster_sock           5     15    768    5    1 : tunables   54   27    8 :
slabdata      3      3      0
fib6_nodes             7     61     64   61    1 : tunables  120   60    8 :
slabdata      1      1      0
ip6_dst_cache          5     10    384   10    1 : tunables   54   27    8 :
slabdata      1      1      0
ndisc_cache            2     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
rawv6_sock             6      8   1024    4    1 : tunables   54   27    8 :
slabdata      2      2      0
udpv6_sock             1      4   1024    4    1 : tunables   54   27    8 :
slabdata      1      1      0
tcpv6_sock             4      6   1792    2    1 : tunables   24   12    8 :
slabdata      3      3      0
rpc_buffers            8      8   2048    2    1 : tunables   24   12    8 :
slabdata      4      4      0
rpc_tasks              8     10    384   10    1 : tunables   54   27    8 :
slabdata      1      1      0
rpc_inode_cache        6      8    896    4    1 : tunables   54   27    8 :
slabdata      2      2      0
ip_fib_alias          10    119     32  119    1 : tunables  120   60    8 :
slabdata      1      1      0
ip_fib_hash           10     61     64   61    1 : tunables  120   60    8 :
slabdata      1      1      0
uhci_urb_priv          0      0     88   45    1 : tunables  120   60    8 :
slabdata      0      0      0
qla2xxx_srbs         128    135    256   15    1 : tunables  120   60    8 :
slabdata      9      9      0
dm-snapshot-in       128    164     96   41    1 : tunables  120   60    8 :
slabdata      4      4      0
dm-snapshot-ex         0      0     32  119    1 : tunables  120   60    8 :
slabdata      0      0      0
ext3_inode_cache   14638  14652    840    4    1 : tunables   54   27    8 :
slabdata   3663   3663      0
ext3_xattr             0      0     88   45    1 : tunables  120   60    8 :
slabdata      0      0      0
journal_handle        15     81     48   81    1 : tunables  120   60    8 :
slabdata      1      1      0
journal_head          37     90     88   45    1 : tunables  120   60    8 :
slabdata      2      2      0
revoke_table           4    225     16  225    1 : tunables  120   60    8 :
slabdata      1      1      0
revoke_record          0      0     32  119    1 : tunables  120   60    8 :
slabdata      0      0      0
dm_tio              1064   1248     24  156    1 : tunables  120   60    8 :
slabdata      8      8      0
dm_io               1064   1190     32  119    1 : tunables  120   60    8 :
slabdata     10     10      0
scsi_cmd_cache         6     14    512    7    1 : tunables   54   27    8 :
slabdata      2      2      0
sgpool-128            32     32   4096    1    1 : tunables   24   12    8 :
slabdata     32     32      0
sgpool-64             32     32   2048    2    1 : tunables   24   12    8 :
slabdata     16     16      0
sgpool-32             32     32   1024    4    1 : tunables   54   27    8 :
slabdata      8      8      0
sgpool-16             32     32    512    8    1 : tunables   54   27    8 :
slabdata      4      4      0
sgpool-8              37     45    256   15    1 : tunables  120   60    8 :
slabdata      3      3      0
unix_sock             43     85    768    5    1 : tunables   54   27    8 :
slabdata     17     17      0
ip_mrt_cache           0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
tcp_tw_bucket          0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
tcp_bind_bucket       18    119     32  119    1 : tunables  120   60    8 :
slabdata      1      1      0
tcp_open_request       0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
inet_peer_cache        0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
secpath_cache          0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
xfrm_dst_cache         0      0    384   10    1 : tunables   54   27    8 :
slabdata      0      0      0
ip_dst_cache          42     80    384   10    1 : tunables   54   27    8 :
slabdata      8      8      0
arp_cache              5     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
raw_sock               5      8    896    4    1 : tunables   54   27    8 :
slabdata      2      2      0
udp_sock               7     12    896    4    1 : tunables   54   27    8 :
slabdata      3      3      0
tcp_sock              25     40   1536    5    2 : tunables   24   12    8 :
slabdata      8      8      0
flow_cache             0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
mqueue_inode_cache      1      4    896    4    1 : tunables   54   27    8 :
slabdata      1      1      0
relayfs_inode_cache      0      0    576    7    1 : tunables   54   27    8 :
slabdata      0      0      0
isofs_inode_cache      0      0    616    6    1 : tunables   54   27    8 :
slabdata      0      0      0
hugetlbfs_inode_cache      1      6    608    6    1 : tunables   54   27    8 :
slabdata      1      1      0
ext2_inode_cache       0      0    736    5    1 : tunables   54   27    8 :
slabdata      0      0      0
ext2_xattr             0      0     88   45    1 : tunables  120   60    8 :
slabdata      0      0      0
dquot                  0      0    224   17    1 : tunables  120   60    8 :
slabdata      0      0      0
eventpoll_pwq          1     54     72   54    1 : tunables  120   60    8 :
slabdata      1      1      0
eventpoll_epi          1     15    256   15    1 : tunables  120   60    8 :
slabdata      1      1      0
kioctx                 0      0    384   10    1 : tunables   54   27    8 :
slabdata      0      0      0
kiocb                  0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
dnotify_cache          2     96     40   96    1 : tunables  120   60    8 :
slabdata      1      1      0
fasync_cache           1    156     24  156    1 : tunables  120   60    8 :
slabdata      1      1      0
shmem_inode_cache    278    280    800    5    1 : tunables   54   27    8 :
slabdata     56     56      0
posix_timers_cache      0      0    184   21    1 : tunables  120   60    8 :
slabdata      0      0      0
uid_cache              6     31    128   31    1 : tunables  120   60    8 :
slabdata      1      1      0
cfq_pool              71    207     56   69    1 : tunables  120   60    8 :
slabdata      3      3      0
crq_pool              30    216     72   54    1 : tunables  120   60    8 :
slabdata      4      4      0
deadline_drq           0      0     96   41    1 : tunables  120   60    8 :
slabdata      0      0      0
as_arq                 0      0    112   35    1 : tunables  120   60    8 :
slabdata      0      0      0
blkdev_ioc            63    238     32  119    1 : tunables  120   60    8 :
slabdata      2      2      0
blkdev_queue          29     45    848    9    2 : tunables   54   27    8 :
slabdata      5      5      0
blkdev_requests       24    105    264   15    1 : tunables   54   27    8 :
slabdata      7      7      0
biovec-(256)         256    256   4096    1    1 : tunables   24   12    8 :
slabdata    256    256      0
biovec-128           256    256   2048    2    1 : tunables   24   12    8 :
slabdata    128    128      0
biovec-64            256    256   1024    4    1 : tunables   54   27    8 :
slabdata     64     64      0
biovec-16            256    270    256   15    1 : tunables  120   60    8 :
slabdata     18     18      0
biovec-4             256    305     64   61    1 : tunables  120   60    8 :
slabdata      5      5      0
biovec-1             278    450     16  225    1 : tunables  120   60    8 :
slabdata      2      2      0
bio                  278    310    128   31    1 : tunables  120   60    8 :
slabdata     10     10      0
file_lock_cache       29    250    160   25    1 : tunables  120   60    8 :
slabdata     10     10      0
sock_inode_cache      98    125    768    5    1 : tunables   54   27    8 :
slabdata     25     25      0
skbuff_head_cache    374    420    384   10    1 : tunables   54   27    8 :
slabdata     42     42      0
sock                   6     12    640    6    1 : tunables   54   27    8 :
slabdata      2      2      0
proc_inode_cache    1068   1068    600    6    1 : tunables   54   27    8 :
slabdata    178    178      0
sigqueue               4     23    168   23    1 : tunables  120   60    8 :
slabdata      1      1      0
radix_tree_node    53598  62321    536    7    1 : tunables   54   27    8 :
slabdata   8903   8903      0
bdev_cache            31     45    768    5    1 : tunables   54   27    8 :
slabdata      9      9      0
mnt_cache             29     60    192   20    1 : tunables  120   60    8 :
slabdata      3      3      0
audit_watch_cache      0      0     88   45    1 : tunables  120   60    8 :
slabdata      0      0      0
inode_cache        68758  68894    568    7    1 : tunables   54   27    8 :
slabdata   9842   9842      0
dentry_cache       59472  59472    240   16    1 : tunables  120   60    8 :
slabdata   3717   3717      0
filp                1005   1185    256   15    1 : tunables  120   60    8 :
slabdata     79     79      0
names_cache           26     26   4096    1    1 : tunables   24   12    8 :
slabdata     26     26      0
avc_node              12    486     72   54    1 : tunables  120   60    8 :
slabdata      9      9      0
key_jar               12     20    192   20    1 : tunables  120   60    8 :
slabdata      1      1      0
idr_layer_cache       90     98    528    7    1 : tunables   54   27    8 :
slabdata     14     14      0
buffer_head       747006 752310     88   45    1 : tunables  120   60    8 :
slabdata  16718  16718      0
mm_struct            119    140   1152    7    2 : tunables   24   12    8 :
slabdata     20     20      0
vm_area_struct      3127   3366    176   22    1 : tunables  120   60    8 :
slabdata    153    153      0
fs_cache             125    305     64   61    1 : tunables  120   60    8 :
slabdata      5      5      0
files_cache          119    153    832    9    2 : tunables   54   27    8 :
slabdata     17     17      0
signal_cache         189    255    256   15    1 : tunables  120   60    8 :
slabdata     17     17      0
sighand_cache        174    174   2112    3    2 : tunables   24   12    8 :
slabdata     58     58      0
task_struct          180    180   1984    2    1 : tunables   24   12    8 :
slabdata     90     90      0
anon_vma            1123   1560     24  156    1 : tunables  120   60    8 :
slabdata     10     10      0
shared_policy_node      0      0     56   69    1 : tunables  120   60    8 :
slabdata      0      0      0
numa_policy           45    225     16  225    1 : tunables  120   60    8 :
slabdata      1      1      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 :
slabdata      0      0      0
size-131072            1      1 131072    1   32 : tunables    8    4    0 :
slabdata      1      1      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 :
slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 :
slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 :
slabdata      0      0      0
size-32768             6      6  32768    1    8 : tunables    8    4    0 :
slabdata      6      6      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 :
slabdata      0      0      0
size-16384             4      4  16384    1    4 : tunables    8    4    0 :
slabdata      4      4      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 :
slabdata      0      0      0
size-8192             22     22   8192    1    2 : tunables    8    4    0 :
slabdata     22     22      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    8 :
slabdata      0      0      0
size-4096            394    394   4096    1    1 : tunables   24   12    8 :
slabdata    394    394      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    8 :
slabdata      0      0      0
size-2048             82     82   2048    2    1 : tunables   24   12    8 :
slabdata     41     41      0
size-1620(DMA)         0      0   1664    4    2 : tunables   24   12    8 :
slabdata      0      0      0
size-1620             61     72   1664    4    2 : tunables   24   12    8 :
slabdata     18     18      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    8 :
slabdata      0      0      0
size-1024            503   1224   1024    4    1 : tunables   54   27    8 :
slabdata    306    306      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    8 :
slabdata      0      0      0
size-512            2603   2624    512    8    1 : tunables   54   27    8 :
slabdata    328    328      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    8 :
slabdata      0      0      0
size-256           70936  71310    256   15    1 : tunables  120   60    8 :
slabdata   4754   4754      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    8 :
slabdata      0      0      0
size-128            2910   3596    128   31    1 : tunables  120   60    8 :
slabdata    116    116      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    8 :
slabdata      0      0      0
size-64            10553  12383     64   61    1 : tunables  120   60    8 :
slabdata    203    203      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    8 :
slabdata      0      0      0
size-32             3180   3570     32  119    1 : tunables  120   60    8 :
slabdata     30     30      0
kmem_cache           180    180    256   15    1 : tunables  120   60    8 :
slabdata     12     12      0





[root@taft-01 ~]# cat /proc/cluster/sm_debug
01000003 recover state 2
01000002 recover state 3
01000003 cb recover state 2
01000005 recover state 2
01000003 recover state 3
01000002 recover state 4
01000005 cb recover state 2
01000005 recover state 3
01000003 recover state 4
01000002 recover state 4


[root@taft-02 ~]# cat /proc/cluster/sm_debug
01000003 recover state 2
01000002 recover state 1
01000003 cb recover state 2
01000005 recover state 2
01000003 recover state 3
01000002 recover state 2
01000005 cb recover state 2
01000005 recover state 3
01000003 recover state 4
01000002 recover state 2


[root@taft-03 ~]# cat /proc/cluster/sm_debug
00000000 sevent state 1
00000001 sevent state 3


[root@taft-04 ~]# cat /proc/cluster/sm_debug
 4
01000005 recover state 2
01000003 recover state 1
01000002 recover state 4
01000005 recover state 2
01000003 recover state 2
01000002 recover state 1
01000002 cb recover state 2
01000005 recover state 2
01000003 recover state 2
01000002 recover state 3
Comment 5 Corey Marthaler 2005-11-29 15:03:22 EST
one peculiar thing, due to the revolver I/O load, there were many of the
following messages:

Nov 28 15:10:33 taft-02 kernel: qla2300 0000:0a:02.0: scsi(1:0:0:0): Queue depth
adjusted to -1.

which Tom Coughlan had this to say about:

<coughlan> corey: You got a Queue Full error from the storage device. This can
happen when there are multiple initiators issuing commands at the same time. The
message indicates that the driver responded by turning off queueing and going to
one-at-a-time command issuing. This will kill performance, and is probably not
what it should have done, unless there were many Queue Full errors.

<corey> coughlan: I had a pretty "heavy" I/O load going at the time, my concern
however is that during this test, recovery hung and before I file a bug, I
wanted to know if this queueing could have somehow caused this by delaying or
dropping the needed recovery I/O

<coughlan> corey: anything is possible, but at least in theory, the queue full
error should have been retried, and the reduction in subsequent queue depth
should only impact performance. But any error path is certainly suspect, due to
lack of testing. Hard to say with a lot more details.
Comment 6 David Teigland 2005-11-29 16:57:34 EST
Storage errors in the past were blamed for i/o hangs
during gfs journal recovery.  This hang is during
dlm recovery, so the qla messages probably aren't relevant.

DLM recovery is hung in all three lockspaces.  In the
case of each lockspace, one node is hung at some stage
of dlm recovery while the other two nodes have finished.

For lockspace "clvmd", taft-02 is stuck after
"purged 0 requests".  taft-01 and taft-04 have
finished.

For lockspace "taft0", taft-04 is stuck after
"rebuild locks".  taft-01 and taft-02 have
finished.

For lockspace "taft1", taft-04 is stuck after
"update remastered resources". taft-01 and
taft-02 have finished.

I haven't found any clues yet as to why dlm recovery
is stuck in each case.  A 'ps' on taft-02 and taft-04
may have revealed a little more about what dlm_recoverd
is stuck or waiting on.
Comment 7 Corey Marthaler 2005-11-29 18:40:18 EST
Appear to have hit this again (taft-04 was shot) here's the ps output:

[root@taft-01 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       U-1,10,2
[3 4 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[3 4 1]

DLM Lock Space:  "taft0"                             4   5 recover 4 -
[3 4 1]

DLM Lock Space:  "taft1"                             6   7 run       -
[3 4 1]

GFS Mount Group: "taft0"                             5   6 recover 0 -
[3 4 1]

GFS Mount Group: "taft1"                             7   8 recover 0 -
[3 4 1]

User:            "usrm::manager"                     3   4 recover 0 -
[3 4 1]


[root@taft-02 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       U-1,10,2
[3 4 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[3 4 1]

DLM Lock Space:  "taft0"                             4   5 recover 2 -
[3 4 1]

DLM Lock Space:  "taft1"                             6   7 run       -
[3 4 1]

GFS Mount Group: "taft0"                             5   6 recover 0 -
[3 4 1]

GFS Mount Group: "taft1"                             7   8 recover 0 -
[3 4 1]

User:            "usrm::manager"                     3   4 recover 0 -
[3 4 1]


[root@taft-03 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       U-1,10,2
[4 3 1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[4 3 1]

DLM Lock Space:  "taft0"                             4   5 recover 4 -
[4 3 1]

DLM Lock Space:  "taft1"                             6   7 run       -
[4 3 1]

GFS Mount Group: "taft0"                             5   6 recover 0 -
[4 3 1]

GFS Mount Group: "taft1"                             7   8 recover 0 -
[4 3 1]

User:            "usrm::manager"                     3   4 recover 0 -
[4 3 1]



[root@taft-01 ~]# ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
  PID WIDE-WCHAN-COLUMN CMD
    1 -                 init [5]
    2 migration_thread  [migration/0]
    3 ksoftirqd         [ksoftirqd/0]
    4 migration_thread  [migration/1]
    5 ksoftirqd         [ksoftirqd/1]
    6 migration_thread  [migration/2]
    7 ksoftirqd         [ksoftirqd/2]
    8 migration_thread  [migration/3]
    9 ksoftirqd         [ksoftirqd/3]
   10 worker_thread     [events/0]
   11 worker_thread     [events/1]
   12 worker_thread     [events/2]
   13 worker_thread     [events/3]
   14 worker_thread     [khelper]
   15 worker_thread     [kacpid]
   51 worker_thread     [kblockd/0]
   52 worker_thread     [kblockd/1]
   53 worker_thread     [kblockd/2]
   54 worker_thread     [kblockd/3]
   72 worker_thread     [aio/0]
   73 worker_thread     [aio/1]
   74 worker_thread     [aio/2]
   75 worker_thread     [aio/3]
   55 hub_thread        [khubd]
   71 kswapd            [kswapd0]
  148 serio_thread      [kseriod]
  212 16045567552327254 [scsi_eh_0]
  228 worker_thread     [kmirrord]
  229 worker_thread     [kmir_mon]
  237 kjournald         [kjournald]
 1023 16045567552327254 [scsi_eh_1]
 1028 -                 udevd
 1045 16045567552327254 [qla2300_1_dpc]
 1326 -                 [shpchpd_event]
 1680 kauditd_thread    [kauditd]
 1892 kjournald         [kjournald]
 2357 -                 syslogd -m 0
 2361 syslog            klogd -x
 2372 -                 irqbalance
 2383 -                 portmap
 2403 -                 rpc.statd
 2431 -                 rpc.idmapd
 2475 -                 ccsd
 2527 cluster_kthread   [cman_comms]
 2529 serviced          [cman_serviced]
 2528 membership_kthrea [cman_memb]
 2530 hello_kthread     [cman_hbeat]
 2565 rt_sigsuspend     fenced -w
 2577 -                 clvmd
 2578 dlm_astd          [dlm_astd]
 2579 dlm_recvd         [dlm_recvd]
 2580 dlm_sendd         [dlm_sendd]
 2581 dlm_recoverd      [dlm_recoverd]
 2707 -                 /usr/sbin/acpid
 2719 -                 cupsd
 2744 -                 /usr/sbin/sshd
 2759 -                 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2779 -                 sendmail: accepting connections
 2787 pause             sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
 2803 -                 gpm -m /dev/input/mice -t imps2
 2848 -                 crond
 2874 -                 xfs -droppriv -daemon
 2884 rt_sigsuspend     anacron -s
 2893 -                 /usr/sbin/atd
 2923 -                 dbus-daemon-1 --system
 2937 -                 cups-config-daemon
 2948 -                 hald
 2965 -                 clurgmgrd
 2970 -                 /sbin/agetty ttyS0 115200 vt100-nav
 2971 -                 /sbin/mingetty tty1
 2972 -                 /sbin/mingetty tty2
 2973 -                 /sbin/mingetty tty3
 2975 -                 /sbin/mingetty tty4
 3002 -                 /sbin/mingetty tty5
 3006 -                 /sbin/mingetty tty6
 3009 -                 /usr/bin/gdm-binary -nodaemon
 3543 pipe_wait         /usr/bin/gdm-binary -nodaemon
 3544 -                 /usr/X11R6/bin/X :0 -audit 0 -auth /var/gdm/:0.Xauth
-nolisten tcp
 3787 dlm_recoverd      [dlm_recoverd]
 3788 dlm_async         [lock_dlm1]
 3789 dlm_async         [lock_dlm2]
 3816 -                 [gfs_scand]
 3817 -                 [gfs_glockd]
 3822 -                 /usr/bin/gdmgreeter
 3836 -                 [gfs_recoverd]
 3837 -                 [gfs_logd]
 3838 glock_wait_intern [gfs_quotad]
 3839 glock_wait_intern [gfs_inoded]
 3848 dlm_recoverd      [dlm_recoverd]
 3849 dlm_async         [lock_dlm1]
 3850 dlm_async         [lock_dlm2]
 3851 -                 [gfs_scand]
 3852 gfs_glockd        [gfs_glockd]
 3863 -                 [gfs_recoverd]
 3865 -                 [gfs_logd]
 3866 glock_wait_intern [gfs_quotad]
 3867 glock_wait_intern [gfs_inoded]
 3875 -                 qarshd
 3876 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 3878 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 12441653 -r /usr/tests/sts -L HEAVY
 3882 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.3878 -a /tmp/revolver_l
 3883 -                 pan2 -x 5 -f /tmp/revolver_load_gen.3878 -a
/tmp/revolver_load_gen.3878.active -d /tmp/revolver_
 3884 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3878 -a
/tmp/revolver_load_gen.3878.active -d /tmp/revolver_
 3885 wait              accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3888 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3878 -a
/tmp/revolver_load_gen.3878.active -d /tmp/revolver_
 3889 wait              genesis -S 11654 -n 500 -d 150 -p 4
 3890 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3878 -a
/tmp/revolver_load_gen.3878.active -d /tmp/revolver_
 3891 wait              sh -c iogen -S 11654 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2488330b 2488
 3893 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3878 -a
/tmp/revolver_load_gen.3878.active -d /tmp/revolver_
 3894 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3896 pipe_wait         iogen -S 11654 -f buffered -m sequential -s read write
readv writev -t 1b -T 2488330b 2488330b:r
 3897 dlm_lock_sync     doio -avk
 3898 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3899 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3900 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3901 wait_async        accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3902 wait_async        genesis -S 11654 -n 500 -d 150 -p 4
 3903 wait_async        genesis -S 11654 -n 500 -d 150 -p 4
 3904 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3905 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3906 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3907 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3908 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3909 -                 qarshd
 3910 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 3912 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 11661784 -r /usr/tests/sts -L HEAVY
 3916 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.3912 -a /tmp/revolver_l
 3917 -                 pan2 -x 5 -f /tmp/revolver_load_gen.3912 -a
/tmp/revolver_load_gen.3912.active -d /tmp/revolver_
 3920 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3912 -a
/tmp/revolver_load_gen.3912.active -d /tmp/revolver_
 3921 wait              genesis -S 11654 -n 500 -d 150 -p 4
 3922 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3912 -a
/tmp/revolver_load_gen.3912.active -d /tmp/revolver_
 3923 wait              sh -c iogen -S 11654 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2332356b 2332
 3924 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3912 -a
/tmp/revolver_load_gen.3912.active -d /tmp/revolver_
 3925 wait_async        genesis -S 11654 -n 500 -d 150 -p 4
 3926 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3927 dlm_lock_sync     genesis -S 11654 -n 500 -d 150 -p 4
 3929 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3912 -a
/tmp/revolver_load_gen.3912.active -d /tmp/revolver_
 3930 dlm_lock_sync     genesis -S 11654 -n 500 -d 150 -p 4
 3931 wait              accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3933 wait_async        genesis -S 11654 -n 500 -d 150 -p 4
 3934 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3935 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3936 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3937 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3938 wait_async        accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3939 pipe_wait         iogen -S 11654 -f buffered -m sequential -s read write
readv writev -t 1b -T 2332356b 2332356b:r
 3941 wait_async        accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3940 dlm_lock_sync     doio -avk
 3942 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 4312 pdflush           [pdflush]
 4322 pdflush           [pdflush]
 5159 -                 sshd: root@pts/0
 5161 wait              -bash
 5564 -                 ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd


[root@taft-02 ~]# ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
  PID WIDE-WCHAN-COLUMN CMD
    1 -                 init [5]
    2 migration_thread  [migration/0]
    3 ksoftirqd         [ksoftirqd/0]
    4 migration_thread  [migration/1]
    5 ksoftirqd         [ksoftirqd/1]
    6 migration_thread  [migration/2]
    7 ksoftirqd         [ksoftirqd/2]
    8 migration_thread  [migration/3]
    9 ksoftirqd         [ksoftirqd/3]
   10 worker_thread     [events/0]
   11 worker_thread     [events/1]
   12 worker_thread     [events/2]
   13 worker_thread     [events/3]
   14 worker_thread     [khelper]
   15 worker_thread     [kacpid]
   51 worker_thread     [kblockd/0]
   52 worker_thread     [kblockd/1]
   53 worker_thread     [kblockd/2]
   54 worker_thread     [kblockd/3]
   55 hub_thread        [khubd]
   72 worker_thread     [aio/0]
   71 kswapd            [kswapd0]
   73 worker_thread     [aio/1]
   74 worker_thread     [aio/2]
   75 worker_thread     [aio/3]
  148 serio_thread      [kseriod]
  212 16045567552327254 [scsi_eh_0]
  228 worker_thread     [kmirrord]
  229 worker_thread     [kmir_mon]
  237 kjournald         [kjournald]
 1017 16045567552327254 [scsi_eh_1]
 1018 16045567552327254 [qla2300_1_dpc]
 1030 -                 udevd
 1699 kauditd_thread    [kauditd]
 1890 kjournald         [kjournald]
 2342 -                 syslogd -m 0
 2346 syslog            klogd -x
 2356 -                 irqbalance
 2366 -                 portmap
 2385 -                 rpc.statd
 2411 -                 rpc.idmapd
 2423 -                 ccsd
 2474 cluster_kthread   [cman_comms]
 2476 serviced          [cman_serviced]
 2475 membership_kthrea [cman_memb]
 2477 hello_kthread     [cman_hbeat]
 2510 rt_sigsuspend     fenced -w
 2521 -                 clvmd
 2522 dlm_astd          [dlm_astd]
 2523 dlm_recvd         [dlm_recvd]
 2524 dlm_sendd         [dlm_sendd]
 2525 dlm_recoverd      [dlm_recoverd]
 2644 -                 /usr/sbin/acpid
 2653 -                 cupsd
 2676 -                 /usr/sbin/sshd
 2689 -                 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2708 -                 sendmail: accepting connections
 2716 pause             sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
 2732 -                 gpm -m /dev/input/mice -t imps2
 2775 -                 crond
 2800 -                 xfs -droppriv -daemon
 2817 -                 /usr/sbin/atd
 2834 -                 dbus-daemon-1 --system
 2845 -                 hald
 2858 -                 clurgmgrd
 2862 -                 /sbin/agetty ttyS0 115200 vt100-nav
 2863 -                 /sbin/mingetty tty1
 2864 -                 /sbin/mingetty tty2
 2865 -                 /sbin/mingetty tty3
 2867 -                 /sbin/mingetty tty4
 2869 -                 /sbin/mingetty tty5
 2873 -                 /sbin/mingetty tty6
 2874 -                 kdm -nodaemon
 3403 -                 /usr/X11R6/bin/X -auth /var/run/xauth/A:0-OzJ2mc vt1
 3404 pipe_wait         -:0
 3485 dlm_wait_function [dlm_recoverd]
 3528 dlm_async         [lock_dlm1]
 3529 dlm_async         [lock_dlm2]
 3556 -                 [gfs_scand]
 3557 -                 [gfs_glockd]
 3561 -                 /usr/bin/kdm_greet
 3562 -                 [gfs_recoverd]
 3564 -                 [gfs_logd]
 3565 glock_wait_intern [gfs_quotad]
 3566 -                 [gfs_inoded]
 3575 dlm_recoverd      [dlm_recoverd]
 3588 dlm_async         [lock_dlm1]
 3589 dlm_async         [lock_dlm2]
 3596 -                 [gfs_scand]
 3597 gfs_glockd        [gfs_glockd]
 3598 -                 [gfs_recoverd]
 3599 -                 [gfs_logd]
 3600 glock_wait_intern [gfs_quotad]
 3601 -                 [gfs_inoded]
 3618 -                 qarshd
 3619 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 3621 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 12441653 -r /usr/tests/sts -L HEAVY
 3625 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.3621 -a /tmp/revolver_l
 3626 -                 pan2 -x 5 -f /tmp/revolver_load_gen.3621 -a
/tmp/revolver_load_gen.3621.active -d /tmp/revolver_
 3627 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3621 -a
/tmp/revolver_load_gen.3621.active -d /tmp/revolver_
 3628 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3629 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3621 -a
/tmp/revolver_load_gen.3621.active -d /tmp/revolver_
 3630 wait              accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3631 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3621 -a
/tmp/revolver_load_gen.3621.active -d /tmp/revolver_
 3632 wait              sh -c iogen -S 11654 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2488330b 2488
 3635 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3621 -a
/tmp/revolver_load_gen.3621.active -d /tmp/revolver_
 3636 wait              genesis -S 11654 -n 500 -d 150 -p 4
 3637 pipe_wait         iogen -S 11654 -f buffered -m sequential -s read write
readv writev -t 1b -T 2488330b 2488330b:r
 3638 dlm_lock_sync     doio -avk
 3641 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3642 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3643 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3644 dlm_lock_sync     accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3645 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3646 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3647 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3648 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3649 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3650 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3651 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3652 -                 qarshd
 3653 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 3655 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 11661784 -r /usr/tests/sts -L HEAVY
 3659 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.3655 -a /tmp/revolver_l
 3660 -                 pan2 -x 5 -f /tmp/revolver_load_gen.3655 -a
/tmp/revolver_load_gen.3655.active -d /tmp/revolver_
 3663 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3655 -a
/tmp/revolver_load_gen.3655.active -d /tmp/revolver_
 3664 wait              genesis -S 11654 -n 500 -d 150 -p 4
 3665 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3655 -a
/tmp/revolver_load_gen.3655.active -d /tmp/revolver_
 3666 wait              sh -c iogen -S 11654 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2332356b 2332
 3667 glock_wait_intern genesis -S 11654 -n 500 -d 150 -p 4
 3669 glock_wait_intern genesis -S 11654 -n 500 -d 150 -p 4
 3670 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3655 -a
/tmp/revolver_load_gen.3655.active -d /tmp/revolver_
 3672 wait              accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3673 glock_wait_intern genesis -S 11654 -n 500 -d 150 -p 4
 3674 glock_wait_intern genesis -S 11654 -n 500 -d 150 -p 4
 3675 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3655 -a
/tmp/revolver_load_gen.3655.active -d /tmp/revolver_
 3676 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3677 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3678 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3679 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3680 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3681 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3683 wait_async        accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3682 pipe_wait         iogen -S 11654 -f buffered -m sequential -s read write
readv writev -t 1b -T 2332356b 2332356b:r
 3684 dlm_lock_sync     doio -avk
 3685 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3910 pdflush           [pdflush]
 3925 pdflush           [pdflush]
 4766 -                 sshd: root@pts/0
 4768 wait              -bash
 5171 -                 ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd


[root@taft-03 ~]# ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
  PID WIDE-WCHAN-COLUMN CMD
    1 -                 init [5]
    2 migration_thread  [migration/0]
    3 ksoftirqd         [ksoftirqd/0]
    4 migration_thread  [migration/1]
    5 ksoftirqd         [ksoftirqd/1]
    6 migration_thread  [migration/2]
    7 ksoftirqd         [ksoftirqd/2]
    8 migration_thread  [migration/3]
    9 ksoftirqd         [ksoftirqd/3]
   10 worker_thread     [events/0]
   11 worker_thread     [events/1]
   12 worker_thread     [events/2]
   13 worker_thread     [events/3]
   14 worker_thread     [khelper]
   15 worker_thread     [kacpid]
   51 worker_thread     [kblockd/0]
   52 worker_thread     [kblockd/1]
   53 worker_thread     [kblockd/2]
   54 worker_thread     [kblockd/3]
   55 hub_thread        [khubd]
   72 worker_thread     [aio/0]
   71 kswapd            [kswapd0]
   73 worker_thread     [aio/1]
   74 worker_thread     [aio/2]
   75 worker_thread     [aio/3]
  148 serio_thread      [kseriod]
  212 16045567552327254 [scsi_eh_0]
  228 worker_thread     [kmirrord]
  229 worker_thread     [kmir_mon]
  237 kjournald         [kjournald]
 1017 16045567552327254 [scsi_eh_1]
 1018 16045567552327254 [qla2300_1_dpc]
 1027 -                 udevd
 1683 kauditd_thread    [kauditd]
 1890 kjournald         [kjournald]
 2342 -                 syslogd -m 0
 2346 syslog            klogd -x
 2356 -                 irqbalance
 2366 -                 portmap
 2385 -                 rpc.statd
 2411 -                 rpc.idmapd
 2423 -                 ccsd
 2474 cluster_kthread   [cman_comms]
 2476 serviced          [cman_serviced]
 2475 membership_kthrea [cman_memb]
 2477 hello_kthread     [cman_hbeat]
 2510 rt_sigsuspend     fenced -w
 2521 -                 clvmd
 2522 dlm_astd          [dlm_astd]
 2523 -                 [dlm_recvd]
 2524 dlm_sendd         [dlm_sendd]
 2525 dlm_recoverd      [dlm_recoverd]
 2644 -                 /usr/sbin/acpid
 2653 -                 cupsd
 2676 -                 /usr/sbin/sshd
 2689 -                 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2708 -                 sendmail: accepting connections
 2716 pause             sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
 2731 -                 gpm -m /dev/input/mice -t imps2
 2775 -                 crond
 2800 -                 xfs -droppriv -daemon
 2817 -                 /usr/sbin/atd
 2835 -                 dbus-daemon-1 --system
 2846 -                 hald
 2860 -                 clurgmgrd
 2864 -                 /sbin/agetty ttyS0 115200 vt100-nav
 2865 -                 /sbin/mingetty tty1
 2866 -                 /sbin/mingetty tty2
 2867 -                 /sbin/mingetty tty3
 2869 -                 /sbin/mingetty tty4
 2910 -                 /sbin/mingetty tty5
 2990 -                 /sbin/mingetty tty6
 2997 -                 kdm -nodaemon
 3405 -                 /usr/X11R6/bin/X -auth /var/run/xauth/A:0-i8kJuX vt1
 3406 pipe_wait         -:0
 3505 dlm_recoverd      [dlm_recoverd]
 3524 dlm_async         [lock_dlm1]
 3525 dlm_async         [lock_dlm2]
 3552 -                 [gfs_scand]
 3553 -                 [gfs_glockd]
 3557 -                 /usr/bin/kdm_greet
 3558 -                 [gfs_recoverd]
 3578 -                 [gfs_logd]
 3579 glock_wait_intern [gfs_quotad]
 3580 -                 [gfs_inoded]
 3589 dlm_recoverd      [dlm_recoverd]
 3590 dlm_async         [lock_dlm1]
 3591 dlm_async         [lock_dlm2]
 3592 -                 [gfs_scand]
 3593 gfs_glockd        [gfs_glockd]
 3604 -                 [gfs_recoverd]
 3605 -                 [gfs_logd]
 3606 -                 [gfs_quotad]
 3607 -                 [gfs_inoded]
 3615 -                 qarshd
 3616 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 3618 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 12441653 -r /usr/tests/sts -L HEAVY
 3622 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.3618 -a /tmp/revolver_l
 3623 -                 pan2 -x 5 -f /tmp/revolver_load_gen.3618 -a
/tmp/revolver_load_gen.3618.active -d /tmp/revolver_
 3624 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3618 -a
/tmp/revolver_load_gen.3618.active -d /tmp/revolver_
 3625 glock_wait_intern growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3626 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3618 -a
/tmp/revolver_load_gen.3618.active -d /tmp/revolver_
 3627 wait              sh -c iogen -S 11654 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2488330b 2488
 3628 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3618 -a
/tmp/revolver_load_gen.3618.active -d /tmp/revolver_
 3629 wait              accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3632 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3618 -a
/tmp/revolver_load_gen.3618.active -d /tmp/revolver_
 3633 wait              genesis -S 11654 -n 500 -d 150 -p 4
 3634 pipe_wait         iogen -S 11654 -f buffered -m sequential -s read write
readv writev -t 1b -T 2488330b 2488330b:r
 3635 dlm_lock_sync     doio -avk
 3638 wait_async        genesis -S 11654 -n 500 -d 150 -p 4
 3639 glock_wait_intern genesis -S 11654 -n 500 -d 150 -p 4
 3640 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3641 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3642 wait_async        accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3643 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3644 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3645 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3646 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3647 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3648 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3649 -                 qarshd
 3650 wait              sh -c
PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/msp/cmarthal/bin:
 3652 pipe_wait         /usr/bin/perl -w
/usr/tests/sts/gfs/bin/revolver_load_gen -S 11661784 -r /usr/tests/sts -L HEAVY
 3656 wait              sh -c PATH=$PATH:/usr/tests/sts/bin; pan2 -x 5 -f
/tmp/revolver_load_gen.3652 -a /tmp/revolver_l
 3657 -                 pan2 -x 5 -f /tmp/revolver_load_gen.3652 -a
/tmp/revolver_load_gen.3652.active -d /tmp/revolver_
 3658 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3652 -a
/tmp/revolver_load_gen.3652.active -d /tmp/revolver_
 3659 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3662 dlm_lock_sync     growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3663 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3652 -a
/tmp/revolver_load_gen.3652.active -d /tmp/revolver_
 3664 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3665 wait              accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3666 -                 growfiles -S 11654 -i 0 -N 500 -n 4 -b
 3667 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3652 -a
/tmp/revolver_load_gen.3652.active -d /tmp/revolver_
 3668 wait              genesis -S 11654 -n 500 -d 150 -p 4
 3669 pipe_wait         pan2 -x 5 -f /tmp/revolver_load_gen.3652 -a
/tmp/revolver_load_gen.3652.active -d /tmp/revolver_
 3670 wait              sh -c iogen -S 11654 -f buffered -m sequential -s
read,write,readv,writev -t 1b -T 2332356b 2332
 3671 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3673 glock_wait_intern genesis -S 11654 -n 500 -d 150 -p 4
 3674 -                 genesis -S 11654 -n 500 -d 150 -p 4
 3676 glock_wait_intern genesis -S 11654 -n 500 -d 150 -p 4
 3677 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3678 wait_async        accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3679 pipe_wait         iogen -S 11654 -f buffered -m sequential -s read write
readv writev -t 1b -T 2332356b 2332356b:r
 3680 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3681 lm_dlm_plock      accordion -S 11654 -p 4 accrdfile1 accrdfile2 accrdfile3
accrdfile4 accrdfile5
 3682 wait_async        doio -avk
 3722 pdflush           [pdflush]
 3749 pdflush           [pdflush]
 4592 -                 sshd: root@pts/0
 4594 wait              -bash
 4997 -                 ps -e -o pid,wchan=WIDE-WCHAN-COLUMN -o cmd
Comment 8 Christine Caulfield 2005-11-30 03:51:16 EST
I wonder if this is related to a very similar problem that HP India are seeing.

I have some detailed tcpdumps from them and it appears that the DLM connections
are being closed during recovery. Why, I'm not sure. I've sent Ravi a patch with
some printks in it so we can see if it's the DLM doing the closing (and if so, why).
Comment 9 David Teigland 2005-11-30 11:02:39 EST
It's the combination of /proc/cluster/dlm_debug and ps that usually
pinpoints where it's hung.  This second case didn't have the dlm_debug.
It probably doesn't matter much, though.  In both cases it looks like
one node is waiting for a reply from another that it never gets.  A
while back we had the very same symptoms for another bug that we
eventually fixed.  That previous bug was related to connections being
closed, much like what Patrick mentioned in comment 8.  We should
probably see if we can get a tcpdump of the hang.

Are you using fixed nodeids (specified in cluster.conf)?
Comment 10 Corey Marthaler 2005-11-30 12:07:54 EST
I still have this cluster in this state from yesterday if there's any other info
you'd like to see as well.

[root@taft-01 ~]# cat /proc/cluster/dlm_debug
s
taft1 purge locks of departed nodes
taft0 purged 1 locks
taft0 update remastered resources
taft1 purged 1 locks
taft1 update remastered resources
taft0 updated 1102 resources
taft0 rebuild locks
taft1 updated 1102 resources
taft1 rebuild locks
taft0 rebuilt 1147 locks
taft0 recover event 24 done
clvmd move flags 0,0,1 ids 18,24,24
clvmd process held requests
clvmd processed 0 requests
clvmd resend marked requests
clvmd resent 0 requests
clvmd recover event 24 finished
taft1 rebuilt 1162 locks
taft1 recover event 24 done
taft1 move flags 0,0,1 ids 22,24,24
taft1 process held requests
taft1 processed 0 requests
taft1 resend marked requests
taft1 resend 1100fe lq 1 flg 200000 node -1/-1 "       7
taft1 resend 15002a lq 1 flg 200000 node -1/-1 "       2
taft1 resend 1000a8 lq 1 flg 200000 node -1/-1 "       7
taft1 resend 160223 lq 1 flg 200000 node -1/-1 "       7
taft1 resend 120298 lq 1 flg 200000 node -1/-1 "       7
taft1 resent 5 requests
taft1 recover event 24 finished


[root@taft-02 ~]# cat /proc/cluster/dlm_debug
s of departed nodes
taft1 purged 2 locks
taft1 update remastered resources
taft1 updated 1102 resources
taft1 rebuild locks
clvmd mark waiting requests
clvmd marked 0 requests
clvmd purge locks of departed nodes
clvmd purged 0 locks
clvmd update remastered resources
clvmd updated 2 resources
clvmd rebuild locks
clvmd rebuilt 2 locks
clvmd recover event 16 done
clvmd move flags 0,0,1 ids 10,16,16
clvmd process held requests
clvmd processed 0 requests
clvmd resend marked requests
clvmd resent 0 requests
clvmd recover event 16 finished
taft1 rebuilt 1151 locks
taft0 mark waiting requests
taft0 marked 0 requests
taft0 purge locks of departed nodes
taft1 recover event 16 done
taft0 purged 3 locks
taft0 update remastered resources
taft1 move flags 0,0,1 ids 14,16,16
taft1 process held requests
taft1 processed 0 requests
taft1 resend marked requests
taft1 resend 70369 lq 1 flg 200000 node -1/-1 "       7
taft1 resent 1 requests
taft1 recover event 16 finished
taft0 updated 1102 resources
taft0 rebuild locks


[root@taft-03 ~]# cat /proc/cluster/dlm_debug
rk 302b4 lq 3 nodeid 2
taft1 mark 403eb lq 1 nodeid -1
taft1 marked 3 requests
taft1 purge locks of departed nodes
taft0 purged 1 locks
taft0 update remastered resources
taft1 purged 1 locks
taft1 update remastered resources
taft0 updated 1102 resources
taft0 rebuild locks
taft0 rebuilt 0 locks
taft0 recover event 8 done
taft1 updated 1101 resources
taft1 rebuild locks
taft1 rebuilt 0 locks
taft1 recover event 8 done
clvmd move flags 0,0,1 ids 2,8,8
clvmd process held requests
clvmd processed 0 requests
clvmd resend marked requests
clvmd resent 0 requests
clvmd recover event 8 finished
taft1 add_to_requestq cmd 9 fr 3
taft1 move flags 0,0,1 ids 6,8,8
taft1 process held requests
taft1 process_requestq cmd 9 fr 3
taft1 processed 1 requests
taft1 resend marked requests
taft1 resend 50022 lq 1 flg 200000 node -1/-1 "       7
taft1 resend 302b4 lq 3 flg 1200008 node 0/0 "       8
taft1 resend 403eb lq 1 flg 200000 node -1/-1 "       7
taft1 resent 3 requests
taft1 recover event 8 finished
Comment 11 David Teigland 2005-11-30 12:35:39 EST
taft-02 is stuck recovering lockspace taft0. It has called dlm_wait_function()
at the end of rebuild_rsbs_send().  It is not a synchronous reply from another
node that it's hung on as I suggested in comment 9.  Instead, it's waiting for
its recover_list to become empty.  So, we need to find out why there are rsb's
still sitting in the recover_list.  Bugs in a variety of places could be
responsible for that.

To start, a patch (or better, systemtap?) should allow us to see exactly
what's left in the recover_list.  That might give us a clue as to whether
bad accounting or logic has left an rsb there incorrectly, or whether a
NEWLOCKIDS message from another node has not been received.
Comment 12 Corey Marthaler 2006-04-27 10:55:17 EDT
Just a note that this was seen last night.
Comment 13 Kiersten (Kerri) Anderson 2006-09-22 15:09:16 EDT
Hmm - could this one be related to the recent mount/umount issues we have been
chasing?  Devel ACK for 4.5.
Comment 14 David Teigland 2006-10-17 12:23:57 EDT
This looks like a one-off case of dlm recovery not working properly,
some odd condition or combination of events that's not being handled
right.  Fixing will require a reproducable test I can run while adding
progressively refined debugging.
Comment 15 Corey Marthaler 2006-11-06 14:47:21 EST
This has not been seen in almost 7 months, I'll reopen if reproduced again.

Note You need to log in before you can comment on or make changes to this bug.