Bug 147682

Summary: Filesystem hung while running moderate IO load
Product: [Retired] Red Hat Cluster Suite Reporter: Dean Jansa <djansa>
Component: dlmAssignee: David Teigland <teigland>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-02 14:52:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dean Jansa 2005-02-10 15:50:07 UTC
Single gfs filesystem, mounted on /mnt/vedder, while running the
'QUICK' IO load the fs appears hung.   5 node cluster.  (morph-01 --
morph-05)

Not all operations hang however.

[root@morph-01 ~]# touch /mnt/vedder/foo
[root@morph-01 ~]# ls /mnt/vedder/foo
/mnt/vedder/foo
[root@morph-01 ~]# cd /mnt/vedder
[root@morph-01 vedder]# ls foo
foo

(As an aside, foo shows up on the other nodes as well, I can do a
 ls /mnt/vedder/foo on morph-02 and it works)


But, trying a ls of a dir I know is in there:

[root@morph-01 vedder]# ls d_io
...hang...


Also a simple ls hangs:

[root@morph-01 ~]# cd /mnt/vedder
[root@morph-01 vedder]# ls
...hang...



All of the outstanding IO load processes on the nodes are stuck --

morph-01 was running:

0 S 500      10963 10962  0  82   0 -   547 wait   Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10968 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10969 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10970 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10971 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10972 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
0 S 500      11082 11081  0  81   0 -  1215 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av
1 D 500      11084 11082  0  81   0 -  1215 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av


morph-02 was running:

0 S 500       8950  8949  0  77   0 -  1083 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m random -o -t 1b -T 1000b
1000b:lock2_small | doio -av
1 D 500       8952  8950  0  77   0 -  1083 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m random -o -t 1b -T 1000b
1000b:lock2_small | doio -av


morph-03 was running:

0 S 500       8695  8694  0  77   0 -  1208 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m sequential -o -t 1b -T 1000b
1000b:lock1_small | doio -av
1 D 500       8697  8695  0  77   0 -  1208 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m sequential -o -t 1b -T 1000b
1000b:lock1_small | doio -av


morph-04 was running:
0 S 500       8683  8682  0  76   0 -  1084 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av
1 D 500       8685  8683  0  76   0 -  1084 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av


morph-05 was not running any IO at the point of the hang.


All of the dlm info from /proc/cluster

morph-01:

dlm_debug:
ve flags 0,1,0 ids 33,35,33
vedder move use event 35
vedder recover event 35
vedder add node 4
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 7 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 35 done
vedder move flags 0,0,1 ids 33,35,35
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 35 finished
vedder move flags 1,0,0 ids 35,35,35
vedder move flags 0,1,0 ids 35,37,35
vedder move use event 37
vedder recover event 37
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 37 done
vedder move flags 0,0,1 ids 35,37,37
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 37 finished

dlm_dir:
dlm_locks:
dlm_stats:
DLM stats (HZ=1000)

Lock operations:      22390
Unlock operations:     9254
Convert operations:   15014
Completion ASTs:      46657
Blocking ASTs:         4514

Lockqueue        num  waittime   ave
WAIT_RSB       11230     63596     5
WAIT_CONV          1         0     0
WAIT_GRANT        63       104     1
WAIT_UNLOCK       30        10     0
Total          11324     63710     5

lock_dlm/debug:
8 3 0
10705 un 5,677f36f 44003f 3 0
10408 qc 5,136979e7 3,3 id 660086 sts -65538 0
10408 qc 5,677f36f 3,3 id 44003f sts -65538 0
10408 qc 5,cefe67f 3,3 id 4e0298 sts -65538 0
11864 lk 2,cefe67f id 0 -1,3 10001
11864 lk 5,cefe67f id 0 -1,3 1
11864 lk 2,677f36f id 0 -1,3 10001
11864 lk 5,677f36f id 0 -1,3 1
11864 lk 2,19e16cf7 id 0 -1,3 10001
11864 lk 5,19e16cf7 id 0 -1,3 1
11864 lk 2,136979e7 id 0 -1,3 10001
11864 lk 5,136979e7 id 0 -1,3 1
10408 qc 2,677f36f -1,3 id 5001ac sts -11 0
10408 qc 2,cefe67f -1,3 id 590367 sts -11 0
10408 qc 5,19e16cf7 -1,3 id 520248 sts 0 0
10408 qc 5,cefe67f -1,3 id 4d0322 sts 0 0
10408 qc 5,677f36f -1,3 id 5f031d sts 0 0
10408 qc 2,136979e7 -1,3 id 560016 sts -11 0
10408 qc 5,136979e7 -1,3 id 57039e sts 0 0
10408 qc 2,19e16cf7 -1,3 id 4d00ad sts -11 0
11865 lk 2,cefe67f id 0 -1,3 10001
11865 lk 2,677f36f id 0 -1,3 10001
11865 lk 2,19e16cf7 id 0 -1,3 10001
11865 lk 2,136979e7 id 0 -1,3 10001
10408 qc 2,677f36f -1,3 id 4901cc sts -11 0
10408 qc 2,cefe67f -1,3 id 5100be sts -11 0
10408 qc 2,19e16cf7 -1,3 id 4b030f sts -11 0
10408 qc 2,136979e7 -1,3 id 4c03c7 sts -11 0
10705 un 5,19e16cf7 520248 3 0
10408 qc 5,19e16cf7 3,3 id 520248 sts -65538 0
10705 un 5,136979e7 57039e 3 0
10705 un 5,cefe67f 4d0322 3 0
10705 un 5,677f36f 5f031d 3 0
10408 qc 5,136979e7 3,3 id 57039e sts -65538 0
10408 qc 5,cefe67f 3,3 id 4d0322 sts -65538 0
10408 qc 5,677f36f 3,3 id 5f031d sts -65538 0
11901 lk 2,cefe67f id 0 -1,3 10001
11901 lk 5,cefe67f id 0 -1,3 1
11901 lk 2,677f36f id 0 -1,3 10001
11901 lk 5,677f36f id 0 -1,3 1
11901 lk 2,19e16cf7 id 0 -1,3 10001
11901 lk 5,19e16cf7 id 0 -1,3 1
11901 lk 2,136979e7 id 0 -1,3 10001
11901 lk 5,136979e7 id 0 -1,3 1
10408 qc 2,cefe67f -1,3 id 5800b7 sts -11 0
10408 qc 2,677f36f -1,3 id 3702d8 sts -11 0
10408 qc 2,19e16cf7 -1,3 id 4703b2 sts -11 0
10408 qc 5,677f36f -1,3 id 5c01b4 sts 0 0
10408 qc 5,cefe67f -1,3 id 5101b6 sts 0 0
10408 qc 5,19e16cf7 -1,3 id 4a0089 sts 0 0
10408 qc 2,136979e7 -1,3 id 4c038a sts -11 0
10408 qc 5,136979e7 -1,3 id 4e0211 sts 0 0
10705 un 5,136979e7 4e0211 3 0
10705 un 5,19e16cf7 4a0089 3 0
10705 un 5,cefe67f 5101b6 3 0
10705 un 5,677f36f 5c01b4 3 0
10408 qc 5,136979e7 3,3 id 4e0211 sts -65538 0
10408 qc 5,cefe67f 3,3 id 5101b6 sts -65538 0
10408 qc 5,19e16cf7 3,3 id 4a0089 sts -65538 0
10408 qc 5,677f36f 3,3 id 5c01b4 sts -65538 0
10703 lk 2,1a id 4f00e3 5,3 5
10408 qc 2,1a 5,3 id 4f00e3 sts 0 0
10703 lk 2,19e16cf8 id 650002 5,3 5
10408 qc 2,19e16cf8 5,3 id 650002 sts 0 0
10705 un 2,1a 4f00e3 3 0
10408 qc 2,1a 3,3 id 4f00e3 sts -65538 0
10705 un 2,19e16cf8 650002 3 0
10408 qc 2,19e16cf8 3,3 id 650002 sts -65538 0
11909 lk 2,1a id 0 -1,3 10000
10408 qc 2,1a -1,3 id 45023e sts 0 0
11909 lk 2,1a id 45023e 3,5 44
10408 qc 2,1a 3,5 id 45023e sts 0 0
11909 lk 3,11 id f03eb 3,5 d
10408 qc 3,11 3,5 id f03eb sts 0 0
11909 lk 2,8e9 id 0 -1,5 0
10408 qc 2,8e9 -1,5 id 4c02d2 sts 0 0
11909 lk 5,8e9 id 0 -1,3 0
10408 qc 5,8e9 -1,3 id 4a02c8 sts 0 0
11913 lk 2,cefe67f id 0 -1,3 10001
11913 lk 5,cefe67f id 0 -1,3 1
11913 lk 2,677f36f id 0 -1,3 10001
11913 lk 5,677f36f id 0 -1,3 1
11913 lk 2,19e16cf8 id 0 -1,3 10001
10408 qc 2,19e16cf8 -1,3 id 500154 sts 0 0
11913 lk 2,19e16cf7 id 0 -1,3 10001
11913 lk 5,19e16cf7 id 0 -1,3 1
11913 lk 2,136979e7 id 0 -1,3 10001
11913 lk 5,136979e7 id 0 -1,3 1
10408 qc 2,677f36f -1,3 id 5d01b4 sts -11 0
10408 qc 2,cefe67f -1,3 id 4e0319 sts -11 0
10408 qc 5,677f36f -1,3 id 450372 sts 0 0
10408 qc 5,cefe67f -1,3 id 6203ad sts 0 0
10408 qc 2,136979e7 -1,3 id 5902ce sts -11 0
10408 qc 2,19e16cf7 -1,3 id 40013c sts -11 0
10408 qc 5,136979e7 -1,3 id 47020d sts 0 0
10408 qc 5,19e16cf7 -1,3 id 4e0129 sts 0 0
10705 un 5,19e16cf7 4e0129 3 0
10408 qc 5,19e16cf7 3,3 id 4e0129 sts -65538 0
10705 un 5,136979e7 47020d 3 0
10705 un 5,cefe67f 6203ad 3 0
10705 un 5,677f36f 450372 3 0
10408 qc 5,cefe67f 3,3 id 6203ad sts -65538 0
10408 qc 5,677f36f 3,3 id 450372 sts -65538 0
10408 qc 5,136979e7 3,3 id 47020d sts -65538 0
10702 lk 2,1a id 45023e 5,3 5
10408 qc 2,1a 5,3 id 45023e sts 0 0
10702 lk 2,8e9 id 4c02d2 5,3 5
10408 qc 2,8e9 5,3 id 4c02d2 sts 0 0

lock_dlm/drop_count:
50000
lock_dlm/drop_period:
60
lock_dlm/max_nodes:
128


morph-02:

dlm_debug:
ve flags 0,1,0 ids 29,31,29
vedder move use event 31
vedder recover event 31
vedder add node 4
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 5 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 31 done
vedder move flags 0,0,1 ids 29,31,31
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 31 finished
vedder move flags 1,0,0 ids 31,31,31
vedder move flags 0,1,0 ids 31,33,31
vedder move use event 33
vedder recover event 33
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 33 done
vedder move flags 0,0,1 ids 31,33,33
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 33 finished

dlm_dir:
dlm_locks:
dlm_stats:
DLM stats (HZ=1000)

Lock operations:      21161
Unlock operations:     8722
Convert operations:   11093
Completion ASTs:      40975
Blocking ASTs:         3786

Lockqueue        num  waittime   ave
WAIT_RSB       10389     64487     6
WAIT_CONV       7536      6022     0
WAIT_GRANT     16641      7954     0
WAIT_UNLOCK     4565     11998     2
Total          39131     90461     2

lock_dlm/debug:

8587 un 3,203a6040 a006c 3 8
8587 un 3,20366048 1002f0 3 8
8587 un 3,201e6078 1a0007 3 8
8587 un 3,1fe460ec 1f01ba 3 8
8587 un 3,20276066 180140 3 8
8587 un 3,1fc76126 120120 3 8
8587 un 3,20106094 1003f7 3 8
8587 un 3,2045602a 12008b 3 8
8587 un 3,1fa4616c 150233 3 8
8587 un 3,2024606c 1501cf 3 8
8587 un 3,1ffe60b8 17039b 3 8
8587 un 3,1fc26130 1a0181 3 8
8587 un 3,1fb06154 1201c3 3 8
8587 un 3,1fcf6116 1301b9 3 8
8587 un 3,1fe160f2 170325 3 8
8587 un 3,202c605c c00bc 3 8
8587 un 3,1fa86164 1401b4 3 8
8587 un 3,203b603e 1600da 3 8
8587 un 3,200160b2 1c0000 3 8
8587 un 3,1fed60da 1a0134 3 8
8587 un 3,1fa96162 1b0273 3 8
8587 un 3,204e6018 1602cc 3 8
8587 un 3,204f6016 1502e4 3 8
8587 un 3,1fc5612a 1a0080 3 8
8587 un 3,1f966188 11010f 3 8
8587 un 3,2055600a 140272 3 8
8587 un 3,1fff60b6 150329 3 8
8587 un 3,1fb96142 1b0159 3 8
8587 un 3,1fee60d8 100331 3 8
8587 un 3,20206074 12036b 3 8
8587 un 3,1faa6160 1f00f1 3 8
8454 qc 3,1fa16172 3,3 id 1802c0 sts -65538 0
8454 qc 3,204b601e 3,3 id 18009f sts -65538 0
8454 qc 3,1fb66148 3,3 id 1a0384 sts -65538 0
8454 qc 3,1fac615c 3,3 id 150333 sts -65538 0
8454 qc 3,1fea60e0 3,3 id f020f sts -65538 0
8454 qc 3,2015608a 3,3 id 1301a1 sts -65538 0
8454 qc 3,20186084 3,3 id 13036a sts -65538 0
8454 qc 3,203a6040 3,3 id a006c sts -65538 0
8454 qc 3,20366048 3,3 id 1002f0 sts -65538 0
8454 qc 3,201e6078 3,3 id 1a0007 sts -65538 0
8454 qc 3,1fe460ec 3,3 id 1f01ba sts -65538 0
8454 qc 3,20276066 3,3 id 180140 sts -65538 0
8454 qc 3,1fc76126 3,3 id 120120 sts -65538 0
8454 qc 3,20106094 3,3 id 1003f7 sts -65538 0
8454 qc 3,2045602a 3,3 id 12008b sts -65538 0
8454 qc 3,1fa4616c 3,3 id 150233 sts -65538 0
8454 qc 3,2024606c 3,3 id 1501cf sts -65538 0
8454 qc 3,1ffe60b8 3,3 id 17039b sts -65538 0
8454 qc 3,1fc26130 3,3 id 1a0181 sts -65538 0
8454 qc 3,1fb06154 3,3 id 1201c3 sts -65538 0
8454 qc 3,1fcf6116 3,3 id 1301b9 sts -65538 0
8454 qc 3,1fe160f2 3,3 id 170325 sts -65538 0
8454 qc 3,202c605c 3,3 id c00bc sts -65538 0
8454 qc 3,1fa86164 3,3 id 1401b4 sts -65538 0
8454 qc 3,203b603e 3,3 id 1600da sts -65538 0
8454 qc 3,200160b2 3,3 id 1c0000 sts -65538 0
8454 qc 3,1fed60da 3,3 id 1a0134 sts -65538 0
8454 qc 3,1fa96162 3,3 id 1b0273 sts -65538 0
8454 qc 3,204e6018 3,3 id 1602cc sts -65538 0
8454 qc 3,204f6016 3,3 id 1502e4 sts -65538 0
8454 qc 3,1fc5612a 3,3 id 1a0080 sts -65538 0
8454 qc 3,1f966188 3,3 id 11010f sts -65538 0
8454 qc 3,2055600a 3,3 id 140272 sts -65538 0
8454 qc 3,1fff60b6 3,3 id 150329 sts -65538 0
8454 qc 3,1fb96142 3,3 id 1b0159 sts -65538 0
8454 qc 3,1fee60d8 3,3 id 100331 sts -65538 0
8454 qc 3,20206074 3,3 id 12036b sts -65538 0
8454 qc 3,1faa6160 3,3 id 1f00f1 sts -65538 0
9629 lk 2,1a id 0 -1,3 10000
8454 qc 2,1a -1,3 id 120152 sts 0 0
9629 lk 2,cefe67f id 0 -1,3 10001
9629 lk 5,cefe67f id 0 -1,3 1
9629 lk 2,19e16cf8 id 0 -1,3 10001
9629 lk 2,19e16cf7 id 0 -1,3 10001
9629 lk 5,19e16cf7 id 0 -1,3 1
9629 lk 2,136979e7 id 0 -1,3 10001
9629 lk 5,136979e7 id 0 -1,3 1
8454 qc 5,cefe67f -1,3 id 1402d1 sts 0 0
9629 lk 2,df id 0 -1,3 10001
9629 lk 5,df id 0 -1,3 1
8454 qc 5,136979e7 -1,3 id d025e sts 0 0
8454 qc 2,cefe67f -1,3 id 18020d sts -11 0
8454 qc 2,19e16cf8 -1,3 id e0068 sts -11 0
8454 qc 2,136979e7 -1,3 id 1c0330 sts -11 0
8454 qc 2,19e16cf7 -1,3 id 1e00d4 sts -11 0
8454 qc 5,19e16cf7 -1,3 id 1b02d3 sts 0 0
8584 lk 2,19e16cf8 id 0 -1,3 10000
8454 qc 5,df -1,3 id 1c00d3 sts 0 0
8454 qc 2,df -1,3 id 1201fc sts -11 0
8454 qc 2,19e16cf8 -1,3 id 170183 sts 0 0
8587 un 5,136979e7 d025e 3 0
8587 un 5,19e16cf7 1b02d3 3 0
8587 un 5,df 1c00d3 3 0
8587 un 5,cefe67f 1402d1 3 0
8454 qc 5,df 3,3 id 1c00d3 sts -65538 0
8454 qc 5,cefe67f 3,3 id 1402d1 sts -65538 0
8454 qc 5,136979e7 3,3 id d025e sts -65538 0
8454 qc 5,19e16cf7 3,3 id 1b02d3 sts -65538 0
8587 un 2,1a 120152 3 0
8454 qc 2,1a 3,3 id 120152 sts -65538 0
8587 un 2,19e16cf8 170183 3 0
8454 qc 2,19e16cf8 3,3 id 170183 sts -65538 0
9667 lk 2,1a id 0 -1,3 10000
8454 qc 2,1a -1,3 id 1b0384 sts 0 0
9667 lk 2,8e9 id 0 -1,3 10000
8454 qc 2,8e9 -1,3 id 140334 sts 0 0
9667 lk 5,8e9 id 0 -1,3 0
8454 qc 5,8e9 -1,3 id 12003b sts 0 0

lock_dlm/drop_count:
50000

lock_dlm/drop_period:
60

lock_dlm/max_nodes:
128


morph-03:

dlm_debug:
ve flags 0,1,0 ids 25,27,25
vedder move use event 27
vedder recover event 27
vedder add node 4
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 27 done
vedder move flags 0,0,1 ids 25,27,27
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 27 finished
vedder move flags 1,0,0 ids 27,27,27
vedder move flags 0,1,0 ids 27,29,27
vedder move use event 29
vedder recover event 29
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 8 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 29 done
vedder move flags 0,0,1 ids 27,29,29
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 29 finished

dlm_dir:
dlm_locks:
dlm_stats:
DLM stats (HZ=1000)

Lock operations:      13516
Unlock operations:     4906
Convert operations:     944
Completion ASTs:      19365
Blocking ASTs:           27

Lockqueue        num  waittime   ave
WAIT_RSB       10825     48502     4
WAIT_CONV         15        15     1
WAIT_GRANT      8350      3822     0
WAIT_UNLOCK       39       113     2
Total          19229     52452     2

lock_dlm/debug:
6e8 7026e 5 0
8196 qc 2,cf0e6e8 5,5 id 7026e sts -65538 0
8329 un 2,cefe6ad 9001f 5 0
8196 qc 2,cefe6ad 5,5 id 9001f sts -65538 0
8329 un 2,cefe8a3 30364 5 0
8196 qc 2,cefe8a3 5,5 id 30364 sts -65538 0
8329 un 2,cf3e6a2 d02da 5 0
8196 qc 2,cf3e6a2 5,5 id d02da sts -65538 0
8329 un 2,cefe84a 503d5 5 0
8196 qc 2,cefe84a 5,5 id 503d5 sts -65538 0
8329 un 2,cefe6ae d02d3 5 0
8196 qc 2,cefe6ae 5,5 id d02d3 sts -65538 0
8329 un 2,cefe8d1 a0288 5 0
8196 qc 2,cefe8d1 5,5 id a0288 sts -65538 0
8329 un 2,cf1e681 a015c 5 0
8196 qc 2,cf1e681 5,5 id a015c sts -65538 0
8329 un 2,cf0e68a 90115 5 0
8196 qc 2,cf0e68a 5,5 id 90115 sts -65538 0
8329 un 2,cf0e738 1102ff 5 0
8196 qc 2,cf0e738 5,5 id 1102ff sts -65538 0
8329 un 2,cf0e6ce 50011 5 0
8196 qc 2,cf0e6ce 5,5 id 50011 sts -65538 0
8329 un 2,cf2e67a 90030 5 0
8196 qc 2,cf2e67a 5,5 id 90030 sts -65538 0
8329 un 2,cefe8ed c01db 5 0
8196 qc 2,cefe8ed 5,5 id c01db sts -65538 0
8329 un 2,cf0e698 a01db 5 0
8196 qc 2,cf0e698 5,5 id a01db sts -65538 0
8329 un 2,cefe896 d0367 5 0
8196 qc 2,cefe896 5,5 id d0367 sts -65538 0
8329 un 2,cefe90f c01fc 5 0
8196 qc 2,cefe90f 5,5 id c01fc sts -65538 0
8329 un 2,cf2e6b4 f03e0 5 0
8196 qc 2,cf2e6b4 5,5 id f03e0 sts -65538 0
8329 un 2,cefe854 501d1 5 0
8196 qc 2,cefe854 5,5 id 501d1 sts -65538 0
8329 un 2,cf0e6fb 80289 5 0
8196 qc 2,cf0e6fb 5,5 id 80289 sts -65538 0
8329 un 2,cf2e69b b02bb 5 0
8196 qc 2,cf2e69b 5,5 id b02bb sts -65538 0
8329 un 2,cefe89b 702da 5 0
8196 qc 2,cefe89b 5,5 id 702da sts -65538 0
8329 un 2,cf4e68c 10037d 5 0
8196 qc 2,cf4e68c 5,5 id 10037d sts -65538 0
8329 un 2,cefe8d0 90065 5 0
8196 qc 2,cefe8d0 5,5 id 90065 sts -65538 0
8329 un 2,cefe85e 8018a 5 0
8196 qc 2,cefe85e 5,5 id 8018a sts -65538 0
8329 un 2,cf0e6c8 b02e7 5 0
8196 qc 2,cf0e6c8 5,5 id b02e7 sts -65538 0
8329 un 2,cefe690 3026a 5 0
8196 qc 2,cefe690 5,5 id 3026a sts -65538 0
8329 un 2,cf4e680 140284 5 0
8196 qc 2,cf4e680 5,5 id 140284 sts -65538 0
8329 un 2,cf3e684 130092 5 0
8196 qc 2,cf3e684 5,5 id 130092 sts -65538 0
8329 un 2,cf1e6da 60138 5 0
8196 qc 2,cf1e6da 5,5 id 60138 sts -65538 0
8329 un 2,cefe879 e02b0 5 0
8196 qc 2,cefe879 5,5 id e02b0 sts -65538 0
8329 un 2,cefe887 90308 5 0
8196 qc 2,cefe887 5,5 id 90308 sts -65538 0
8329 un 2,cf1e6d2 e01a1 5 0
8196 qc 2,cf1e6d2 5,5 id e01a1 sts -65538 0
8329 un 2,cefe837 503d8 5 0
8196 qc 2,cefe837 5,5 id 503d8 sts -65538 0
8329 un 2,cefe6b0 10029a 5 0
8196 qc 2,cefe6b0 5,5 id 10029a sts -65538 0
8329 un 2,cefe8a6 600e4 5 0
8196 qc 2,cefe8a6 5,5 id 600e4 sts -65538 0
8329 un 2,cefe86e 701d8 5 0
8196 qc 2,cefe86e 5,5 id 701d8 sts -65538 0
8329 un 2,cf0e6c9 a02a7 5 0
8196 qc 2,cf0e6c9 5,5 id a02a7 sts -65538 0
8329 un 2,cf0e6b6 c014a 5 0
8196 qc 2,cf0e6b6 5,5 id c014a sts -65538 0
8329 un 2,cf0e6c5 901aa 5 0
8196 qc 2,cf0e6c5 5,5 id 901aa sts -65538 0
8329 un 2,cf0e6d2 e01dc 5 0
8196 qc 2,cf0e6d2 5,5 id e01dc sts -65538 0
8329 un 2,cefe8c1 7005b 5 0
8196 qc 2,cefe8c1 5,5 id 7005b sts -65538 0
8329 un 2,cf0e680 100135 5 0
8196 qc 2,cf0e680 5,5 id 100135 sts -65538 0
8329 un 2,cf0e6fa c0087 5 0
8196 qc 2,cf0e6fa 5,5 id c0087 sts -65538 0
8329 un 2,cf0e732 d028e 5 0
8196 qc 2,cf0e732 5,5 id d028e sts -65538 0
8329 un 2,cf0e68d 90378 5 0
8196 qc 2,cf0e68d 5,5 id 90378 sts -65538 0
8329 un 2,1a a013b 3 0
8196 qc 2,1a 3,3 id a013b sts -65538 0
8329 un 2,19e16cf8 40004 3 0
8196 qc 2,19e16cf8 3,3 id 40004 sts -65538 0
8329 un 2,cf0e736 f00cc 5 0
8196 qc 2,cf0e736 5,5 id f00cc sts -65538 0
8329 un 2,cf0e734 c03af 5 0
8196 qc 2,cf0e734 5,5 id c03af sts -65538 0
8329 un 2,cf0e735 d01fb 5 0
8196 qc 2,cf0e735 5,5 id d01fb sts -65538 0
8326 lk 3,cefe67a id a0221 5,3 d
8196 qc 3,cefe67a 5,3 id a0221 sts 0 0
8326 lk 3,cf0e678 id 11029a 5,3 d
8196 qc 3,cf0e678 5,3 id 11029a sts 0 0
8326 lk 3,cf1e676 id b01d0 5,3 d
8196 qc 3,cf1e676 5,3 id b01d0 sts 0 0
8326 lk 3,cf2e674 id e038d 5,3 d
8196 qc 3,cf2e674 5,3 id e038d sts 0 0
8326 lk 3,cf3e672 id d0365 5,3 d
8196 qc 3,cf3e672 5,3 id d0365 sts 0 0
8326 lk 3,cf4e670 id d029c 5,3 d
8196 qc 3,cf4e670 5,3 id d029c sts 0 0
8326 lk 3,cf5e66e id a0078 5,3 d
8196 qc 3,cf5e66e 5,3 id a0078 sts 0 0

lock_dlm/drop_count:
50000
lock_dlm/drop_period:
60
lock_dlm/max_nodes:
128


morph-04:
dlm_debug:
22
clvmd process held requests
clvmd processed 0 requests
clvmd resend marked requests
clvmd resent 0 requests
clvmd recover event 22 finished
vedder move flags 0,1,0 ids 0,23,0
vedder move use event 23
vedder recover event 23 (first)
vedder add nodes
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 6 resources
vedder recover event 23 done
vedder move flags 0,0,1 ids 0,23,23
vedder process held requests
vedder processed 0 requests
vedder recover event 23 finished
vedder move flags 1,0,0 ids 23,23,23
vedder move flags 0,1,0 ids 23,25,23
vedder move use event 25
vedder recover event 25
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 25 done
vedder move flags 0,0,1 ids 23,25,25
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 25 finished
dlm_dir:
dlm_locks:
dlm_stat:
DLM stats (HZ=1000)

Lock operations:      22499
Unlock operations:    13863
Convert operations:    4014
Completion ASTs:      40375
Blocking ASTs:           88

Lockqueue        num  waittime   ave
WAIT_RSB       11472     69914     6
WAIT_CONV         12         4     0
WAIT_GRANT     16690     12093     0
WAIT_UNLOCK     8379     12587     1
Total          36553     94598     2

lock_dlm/debug:
3 8
8320 un 3,1fcf6116 120345 3 8
8320 un 3,1fe160f2 1b00c5 3 8
8320 un 3,202c605c 18000b 3 8
8320 un 3,1fa86164 1c0024 3 8
8320 un 3,203b603e 1b030b 3 8
8320 un 3,200160b2 1803e9 3 8
8320 un 3,1fed60da 1b01de 3 8
8320 un 3,1fa96162 190223 3 8
8320 un 3,204e6018 12022f 3 8
8320 un 3,204f6016 12016e 3 8
8320 un 3,1fc5612a 1503b2 3 8
8320 un 3,2055600a d0314 3 8
8320 un 3,1fff60b6 13031a 3 8
8320 un 3,1fb96142 1502cb 3 8
8320 un 3,1fee60d8 180180 3 8
8320 un 3,20206074 1b02ca 3 8
8320 un 3,1faa6160 1a007f 3 8
8187 qc 3,20406034 3,3 id 1f0290 sts -65538 0
8187 qc 3,202f6056 3,3 id 2003a9 sts -65538 0
8187 qc 3,1fd3610e 3,3 id 1d033e sts -65538 0
8187 qc 3,200d609a 3,3 id 1c0335 sts -65538 0
8187 qc 3,201c607c 3,3 id 1a021e sts -65538 0
8187 qc 3,20396042 3,3 id 1a0306 sts -65538 0
8187 qc 3,1fb16152 3,3 id 1d0363 sts -65538 0
8187 qc 3,1ff760c6 3,3 id 150001 sts -65538 0
8187 qc 3,1fb5614a 3,3 id 1402f0 sts -65538 0
8187 qc 3,201b607e 3,3 id 170168 sts -65538 0
8187 qc 3,20486024 3,3 id 140204 sts -65538 0
8187 qc 3,1faf6156 3,3 id 1600e9 sts -65538 0
8187 qc 3,1fc4612c 3,3 id 180016 sts -65538 0
8187 qc 3,1fde60f8 3,3 id 1d038a sts -65538 0
8187 qc 3,200f6096 3,3 id 14013f sts -65538 0
8187 qc 3,203f6036 3,3 id 180339 sts -65538 0
8187 qc 3,1fb26150 3,3 id 1f0120 sts -65538 0
8187 qc 3,1fd96102 3,3 id 20013a sts -65538 0
8187 qc 3,1fca6120 3,3 id 1901a8 sts -65538 0
8187 qc 3,1fab615e 3,3 id 1d0352 sts -65538 0
8187 qc 3,1ff360ce 3,3 id 1f032e sts -65538 0
8187 qc 3,1fb76146 3,3 id 1b01d2 sts -65538 0
8187 qc 3,2053600e 3,3 id 190395 sts -65538 0
8187 qc 3,1fe760e6 3,3 id 130246 sts -65538 0
8187 qc 3,1fb4614c 3,3 id 1001e3 sts -65538 0
8187 qc 3,20316052 3,3 id 190057 sts -65538 0
8187 qc 3,1ff260d0 3,3 id 1b0131 sts -65538 0
8187 qc 3,20306054 3,3 id 15012d sts -65538 0
8187 qc 3,1fdf60f6 3,3 id 12012a sts -65538 0
8187 qc 3,201d607a 3,3 id 1e03ae sts -65538 0
8187 qc 3,1ff960c2 3,3 id 1803ab sts -65538 0
8187 qc 3,1fd5610a 3,3 id 130321 sts -65538 0
8187 qc 3,1fad615a 3,3 id 16038d sts -65538 0
8187 qc 3,204d601a 3,3 id 1303da sts -65538 0
8187 qc 3,1fb3614e 3,3 id 190002 sts -65538 0
8187 qc 3,203e6038 3,3 id 100269 sts -65538 0
8187 qc 3,2043602e 3,3 id 190106 sts -65538 0
8187 qc 3,2023606e 3,3 id a02ee sts -65538 0
8187 qc 3,1ffc60bc 3,3 id 140000 sts -65538 0
8187 qc 3,1fcb611e 3,3 id f03d2 sts -65538 0
8187 qc 3,20526010 3,3 id 1a0167 sts -65538 0
8187 qc 3,200e6098 3,3 id 1a0169 sts -65538 0
8187 qc 3,200260b0 3,3 id e02cc sts -65538 0
8187 qc 3,204b601e 3,3 id 15019d sts -65538 0
8187 qc 3,1fb66148 3,3 id 1d00bc sts -65538 0
8187 qc 3,1fac615c 3,3 id 230174 sts -65538 0
8187 qc 3,1fea60e0 3,3 id 180266 sts -65538 0
8187 qc 3,2015608a 3,3 id 12036f sts -65538 0
8187 qc 3,20186084 3,3 id e00cb sts -65538 0
8187 qc 3,203a6040 3,3 id 1601ee sts -65538 0
8187 qc 3,20366048 3,3 id 16026a sts -65538 0
8187 qc 3,201e6078 3,3 id 1202da sts -65538 0
8187 qc 3,1fe460ec 3,3 id 1300e2 sts -65538 0
8187 qc 3,20276066 3,3 id 1200b7 sts -65538 0
8187 qc 3,1fc76126 3,3 id 180043 sts -65538 0
8187 qc 3,20106094 3,3 id 140213 sts -65538 0
8187 qc 3,2045602a 3,3 id 1100ac sts -65538 0
8187 qc 3,2024606c 3,3 id 1a00f9 sts -65538 0
8187 qc 3,1ffe60b8 3,3 id 160168 sts -65538 0
8187 qc 3,1fc26130 3,3 id 1a0287 sts -65538 0
8187 qc 3,1fb06154 3,3 id 1701f3 sts -65538 0
8187 qc 3,1fcf6116 3,3 id 120345 sts -65538 0
8187 qc 3,1fe160f2 3,3 id 1b00c5 sts -65538 0
8187 qc 3,202c605c 3,3 id 18000b sts -65538 0
8187 qc 3,1fa86164 3,3 id 1c0024 sts -65538 0
8187 qc 3,203b603e 3,3 id 1b030b sts -65538 0
8187 qc 3,200160b2 3,3 id 1803e9 sts -65538 0
8187 qc 3,1fed60da 3,3 id 1b01de sts -65538 0
8187 qc 3,1fa96162 3,3 id 190223 sts -65538 0
8187 qc 3,204e6018 3,3 id 12022f sts -65538 0
8187 qc 3,204f6016 3,3 id 12016e sts -65538 0
8187 qc 3,1fc5612a 3,3 id 1503b2 sts -65538 0
8187 qc 3,2055600a 3,3 id d0314 sts -65538 0
8187 qc 3,1fff60b6 3,3 id 13031a sts -65538 0
8187 qc 3,1fb96142 3,3 id 1502cb sts -65538 0
8187 qc 3,1fee60d8 3,3 id 180180 sts -65538 0
8187 qc 3,20206074 3,3 id 1b02ca sts -65538 0
8187 qc 3,1faa6160 3,3 id 1a007f sts -65538 0

lock_dlm/drop_count:
50000
lock_dlm/drop_period:
60
lock_dlm/max_nodes:
128






FWIW --  morph-01 panic'ed while I was gathering up the dlm debug info.

The fs hung up early last night, I was gathering info this morning,
so it was hung for 8+ hours.

Unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
c014a340
*pde = f5c6f067
Oops: 0002 [#1]
Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_dlm(U) dlm(U)
cman(U) lock
_harness(U) parport_pc lp parport autofs4 md5 ipv6 sunrpc uhci_hcd
hw_random e10
00 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300
qla2xxx scsi_tra
nsport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c014a340>]    Tainted: GF     VLI
EFLAGS: 00010092   (2.6.9-5.EL)
EIP is at cache_alloc_refill+0x146/0x227
eax: 00000000   ebx: c32e4b80   ecx: c32e4b00   edx: c32e4b8c
esi: 00000010   edi: c32e4b8c   ebp: c32e8000   esp: f5b24efc
ds: 007b   es: 007b   ss: 0068
Process atd (pid: 2200, threadinfo=f5b24000 task=f5b3e230)
Stack: 00000050 00000050 00000246 c32e4b80 f5b2ebf4 c014a663 00000002
c3123600
       00000000 f8862694 00000000 f5b24000 f886271b 00000001 00000000
f5b2ebf4
       f88e2497 00000001 f5b2ebf4 f5b2ec30 c018812c 001d31db 00000000
c333c400
Call Trace:
 [<c014a663>] kmem_cache_alloc+0x46/0x4c
 [<f8862694>] new_handle+0x15/0x40 [jbd]
 [<f886271b>] journal_start+0x5c/0x9e [jbd]
 [<f88e2497>] ext3_dirty_inode+0x24/0x66 [ext3]
 [<c018812c>] __mark_inode_dirty+0x28/0x23b
 [<c0176346>] filldir64+0x0/0x11a
 [<c0180a8b>] update_atime+0x6a/0x90
 [<c0176091>] vfs_readdir+0x9d/0xb7
 [<c01764c5>] sys_getdents64+0x65/0x9f
 [<c0301bfb>] syscall_call+0x7/0xb
Code: af 43 34 03 41 0c 89 44 95 10 ff 45 00 8b 51 10 0f b7 41 14 42
89 51 10 0f
 b7 44 41 18 66 89 41 14 3b 53 3c 72 cc 8b 51 04 8b 01 <89> 50 04 89
02 66 83 79
 14 ff c7 01 00 01 10 00 c7 41 04 00 02
 <0>Fatal exception: panic in 5 seconds
/Kernel panic - not syncing: mm/slab.c:1984:
spin_lock(mm/slab.c:c32e4bc4) alrea
dy locked by mm/slab.c/1984

Comment 1 Dean Jansa 2005-02-10 15:53:59 UTC
Dave, if this looks more like a GFS issue, please reassign.  I had to
pick one, so I'm picking on you.  :)

Versions of the modules:

DLM 2.6.9-18.0 (built Feb  9 2005 14:56:57)
Lock_DLM (built Feb  9 2005 15:07:12)
GFS 2.6.9-18.3 (built Feb  9 2005 15:07:30)
CMAN 2.6.9-17.2 (built Feb  9 2005 14:52:26)
Lock_Harness 2.6.9-18.3 (built Feb  9 2005 15:07:09)

Comment 2 David Teigland 2005-02-10 16:07:56 UTC
I'm guessing this is a plock/flock problem.  A dump of the dlm
locks would help here:  
echo "name of lockspace" >> /proc/cluster/dlm_locks
cat /proc/cluster/dlm_locks > locks.txt

Is there a "quick" way for me to run this load on my
machines?


Comment 3 Dean Jansa 2005-02-10 16:14:18 UTC
Hmm, all of the /proc/cluster/dlm_locks are empty....

Comment 4 Dean Jansa 2005-02-10 16:18:00 UTC
As for the quick was to run the load....

You have the sistina-test tree correct?  (You run revolver if I recall)

You can run sistina-test/vedder/bin/vedder -R <your cluster resource
file> -l <path to sistina-test root> -S QUICK

For example I ran: 
vedder -R ../../var/share/resource_files/morph-cluster.xml -l
~/src/sistina-test -S QUICK 

Having said that...  Not sure if you will hit it, I have not tried to
reproduce it yet.



Comment 5 Dean Jansa 2005-02-10 16:28:42 UTC
Oops, the dlm_locks are not empty.  Have to paste the correct
lockspace name...  I'm gathering.

Comment 6 Dean Jansa 2005-02-10 16:37:07 UTC
Dave, you can find the dlm_lock output from each node at:

/home/msp/djansa/pub/bugs/147682/morph*.dlm_locks



Comment 7 David Teigland 2005-02-11 10:00:34 UTC
The lock dump shows one problem that I've just checked in a fix for.
It was related to the quota lock.  I don't know if it explains the
hang, though; would probably need a kdb trace to know for sure.


Comment 8 David Teigland 2005-02-21 03:36:03 UTC
Neither Dean nor I have been able to reproduce this since the
fix mentioned above.  That could indicate that the problem is solved,
the problem is difficult to reproduce or both.