Bug 147682 - Filesystem hung while running moderate IO load
Summary: Filesystem hung while running moderate IO load
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-02-10 15:50 UTC by Dean Jansa
Modified: 2009-04-16 20:30 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-02 14:52:18 UTC
Embargoed:


Attachments (Terms of Use)

Description Dean Jansa 2005-02-10 15:50:07 UTC
Single gfs filesystem, mounted on /mnt/vedder, while running the
'QUICK' IO load the fs appears hung.   5 node cluster.  (morph-01 --
morph-05)

Not all operations hang however.

[root@morph-01 ~]# touch /mnt/vedder/foo
[root@morph-01 ~]# ls /mnt/vedder/foo
/mnt/vedder/foo
[root@morph-01 ~]# cd /mnt/vedder
[root@morph-01 vedder]# ls foo
foo

(As an aside, foo shows up on the other nodes as well, I can do a
 ls /mnt/vedder/foo on morph-02 and it works)


But, trying a ls of a dir I know is in there:

[root@morph-01 vedder]# ls d_io
...hang...


Also a simple ls hangs:

[root@morph-01 ~]# cd /mnt/vedder
[root@morph-01 vedder]# ls
...hang...



All of the outstanding IO load processes on the nodes are stuck --

morph-01 was running:

0 S 500      10963 10962  0  82   0 -   547 wait   Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10968 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10969 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10970 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10971 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
1 D 500      10972 10963  0  78   0 -   550 glock_ Feb09 ?       
00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock
0 S 500      11082 11081  0  81   0 -  1215 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av
1 D 500      11084 11082  0  81   0 -  1215 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av


morph-02 was running:

0 S 500       8950  8949  0  77   0 -  1083 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m random -o -t 1b -T 1000b
1000b:lock2_small | doio -av
1 D 500       8952  8950  0  77   0 -  1083 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m random -o -t 1b -T 1000b
1000b:lock2_small | doio -av


morph-03 was running:

0 S 500       8695  8694  0  77   0 -  1208 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m sequential -o -t 1b -T 1000b
1000b:lock1_small | doio -av
1 D 500       8697  8695  0  77   0 -  1208 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m sequential -o -t 1b -T 1000b
1000b:lock1_small | doio -av


morph-04 was running:
0 S 500       8683  8682  0  76   0 -  1084 wait   Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av
1 D 500       8685  8683  0  76   0 -  1084 glock_ Feb09 ?       
00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b
1000b:lock3_small | doio -av


morph-05 was not running any IO at the point of the hang.


All of the dlm info from /proc/cluster

morph-01:

dlm_debug:
ve flags 0,1,0 ids 33,35,33
vedder move use event 35
vedder recover event 35
vedder add node 4
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 7 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 35 done
vedder move flags 0,0,1 ids 33,35,35
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 35 finished
vedder move flags 1,0,0 ids 35,35,35
vedder move flags 0,1,0 ids 35,37,35
vedder move use event 37
vedder recover event 37
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 37 done
vedder move flags 0,0,1 ids 35,37,37
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 37 finished

dlm_dir:
dlm_locks:
dlm_stats:
DLM stats (HZ=1000)

Lock operations:      22390
Unlock operations:     9254
Convert operations:   15014
Completion ASTs:      46657
Blocking ASTs:         4514

Lockqueue        num  waittime   ave
WAIT_RSB       11230     63596     5
WAIT_CONV          1         0     0
WAIT_GRANT        63       104     1
WAIT_UNLOCK       30        10     0
Total          11324     63710     5

lock_dlm/debug:
8 3 0
10705 un 5,677f36f 44003f 3 0
10408 qc 5,136979e7 3,3 id 660086 sts -65538 0
10408 qc 5,677f36f 3,3 id 44003f sts -65538 0
10408 qc 5,cefe67f 3,3 id 4e0298 sts -65538 0
11864 lk 2,cefe67f id 0 -1,3 10001
11864 lk 5,cefe67f id 0 -1,3 1
11864 lk 2,677f36f id 0 -1,3 10001
11864 lk 5,677f36f id 0 -1,3 1
11864 lk 2,19e16cf7 id 0 -1,3 10001
11864 lk 5,19e16cf7 id 0 -1,3 1
11864 lk 2,136979e7 id 0 -1,3 10001
11864 lk 5,136979e7 id 0 -1,3 1
10408 qc 2,677f36f -1,3 id 5001ac sts -11 0
10408 qc 2,cefe67f -1,3 id 590367 sts -11 0
10408 qc 5,19e16cf7 -1,3 id 520248 sts 0 0
10408 qc 5,cefe67f -1,3 id 4d0322 sts 0 0
10408 qc 5,677f36f -1,3 id 5f031d sts 0 0
10408 qc 2,136979e7 -1,3 id 560016 sts -11 0
10408 qc 5,136979e7 -1,3 id 57039e sts 0 0
10408 qc 2,19e16cf7 -1,3 id 4d00ad sts -11 0
11865 lk 2,cefe67f id 0 -1,3 10001
11865 lk 2,677f36f id 0 -1,3 10001
11865 lk 2,19e16cf7 id 0 -1,3 10001
11865 lk 2,136979e7 id 0 -1,3 10001
10408 qc 2,677f36f -1,3 id 4901cc sts -11 0
10408 qc 2,cefe67f -1,3 id 5100be sts -11 0
10408 qc 2,19e16cf7 -1,3 id 4b030f sts -11 0
10408 qc 2,136979e7 -1,3 id 4c03c7 sts -11 0
10705 un 5,19e16cf7 520248 3 0
10408 qc 5,19e16cf7 3,3 id 520248 sts -65538 0
10705 un 5,136979e7 57039e 3 0
10705 un 5,cefe67f 4d0322 3 0
10705 un 5,677f36f 5f031d 3 0
10408 qc 5,136979e7 3,3 id 57039e sts -65538 0
10408 qc 5,cefe67f 3,3 id 4d0322 sts -65538 0
10408 qc 5,677f36f 3,3 id 5f031d sts -65538 0
11901 lk 2,cefe67f id 0 -1,3 10001
11901 lk 5,cefe67f id 0 -1,3 1
11901 lk 2,677f36f id 0 -1,3 10001
11901 lk 5,677f36f id 0 -1,3 1
11901 lk 2,19e16cf7 id 0 -1,3 10001
11901 lk 5,19e16cf7 id 0 -1,3 1
11901 lk 2,136979e7 id 0 -1,3 10001
11901 lk 5,136979e7 id 0 -1,3 1
10408 qc 2,cefe67f -1,3 id 5800b7 sts -11 0
10408 qc 2,677f36f -1,3 id 3702d8 sts -11 0
10408 qc 2,19e16cf7 -1,3 id 4703b2 sts -11 0
10408 qc 5,677f36f -1,3 id 5c01b4 sts 0 0
10408 qc 5,cefe67f -1,3 id 5101b6 sts 0 0
10408 qc 5,19e16cf7 -1,3 id 4a0089 sts 0 0
10408 qc 2,136979e7 -1,3 id 4c038a sts -11 0
10408 qc 5,136979e7 -1,3 id 4e0211 sts 0 0
10705 un 5,136979e7 4e0211 3 0
10705 un 5,19e16cf7 4a0089 3 0
10705 un 5,cefe67f 5101b6 3 0
10705 un 5,677f36f 5c01b4 3 0
10408 qc 5,136979e7 3,3 id 4e0211 sts -65538 0
10408 qc 5,cefe67f 3,3 id 5101b6 sts -65538 0
10408 qc 5,19e16cf7 3,3 id 4a0089 sts -65538 0
10408 qc 5,677f36f 3,3 id 5c01b4 sts -65538 0
10703 lk 2,1a id 4f00e3 5,3 5
10408 qc 2,1a 5,3 id 4f00e3 sts 0 0
10703 lk 2,19e16cf8 id 650002 5,3 5
10408 qc 2,19e16cf8 5,3 id 650002 sts 0 0
10705 un 2,1a 4f00e3 3 0
10408 qc 2,1a 3,3 id 4f00e3 sts -65538 0
10705 un 2,19e16cf8 650002 3 0
10408 qc 2,19e16cf8 3,3 id 650002 sts -65538 0
11909 lk 2,1a id 0 -1,3 10000
10408 qc 2,1a -1,3 id 45023e sts 0 0
11909 lk 2,1a id 45023e 3,5 44
10408 qc 2,1a 3,5 id 45023e sts 0 0
11909 lk 3,11 id f03eb 3,5 d
10408 qc 3,11 3,5 id f03eb sts 0 0
11909 lk 2,8e9 id 0 -1,5 0
10408 qc 2,8e9 -1,5 id 4c02d2 sts 0 0
11909 lk 5,8e9 id 0 -1,3 0
10408 qc 5,8e9 -1,3 id 4a02c8 sts 0 0
11913 lk 2,cefe67f id 0 -1,3 10001
11913 lk 5,cefe67f id 0 -1,3 1
11913 lk 2,677f36f id 0 -1,3 10001
11913 lk 5,677f36f id 0 -1,3 1
11913 lk 2,19e16cf8 id 0 -1,3 10001
10408 qc 2,19e16cf8 -1,3 id 500154 sts 0 0
11913 lk 2,19e16cf7 id 0 -1,3 10001
11913 lk 5,19e16cf7 id 0 -1,3 1
11913 lk 2,136979e7 id 0 -1,3 10001
11913 lk 5,136979e7 id 0 -1,3 1
10408 qc 2,677f36f -1,3 id 5d01b4 sts -11 0
10408 qc 2,cefe67f -1,3 id 4e0319 sts -11 0
10408 qc 5,677f36f -1,3 id 450372 sts 0 0
10408 qc 5,cefe67f -1,3 id 6203ad sts 0 0
10408 qc 2,136979e7 -1,3 id 5902ce sts -11 0
10408 qc 2,19e16cf7 -1,3 id 40013c sts -11 0
10408 qc 5,136979e7 -1,3 id 47020d sts 0 0
10408 qc 5,19e16cf7 -1,3 id 4e0129 sts 0 0
10705 un 5,19e16cf7 4e0129 3 0
10408 qc 5,19e16cf7 3,3 id 4e0129 sts -65538 0
10705 un 5,136979e7 47020d 3 0
10705 un 5,cefe67f 6203ad 3 0
10705 un 5,677f36f 450372 3 0
10408 qc 5,cefe67f 3,3 id 6203ad sts -65538 0
10408 qc 5,677f36f 3,3 id 450372 sts -65538 0
10408 qc 5,136979e7 3,3 id 47020d sts -65538 0
10702 lk 2,1a id 45023e 5,3 5
10408 qc 2,1a 5,3 id 45023e sts 0 0
10702 lk 2,8e9 id 4c02d2 5,3 5
10408 qc 2,8e9 5,3 id 4c02d2 sts 0 0

lock_dlm/drop_count:
50000
lock_dlm/drop_period:
60
lock_dlm/max_nodes:
128


morph-02:

dlm_debug:
ve flags 0,1,0 ids 29,31,29
vedder move use event 31
vedder recover event 31
vedder add node 4
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 5 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 31 done
vedder move flags 0,0,1 ids 29,31,31
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 31 finished
vedder move flags 1,0,0 ids 31,31,31
vedder move flags 0,1,0 ids 31,33,31
vedder move use event 33
vedder recover event 33
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 33 done
vedder move flags 0,0,1 ids 31,33,33
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 33 finished

dlm_dir:
dlm_locks:
dlm_stats:
DLM stats (HZ=1000)

Lock operations:      21161
Unlock operations:     8722
Convert operations:   11093
Completion ASTs:      40975
Blocking ASTs:         3786

Lockqueue        num  waittime   ave
WAIT_RSB       10389     64487     6
WAIT_CONV       7536      6022     0
WAIT_GRANT     16641      7954     0
WAIT_UNLOCK     4565     11998     2
Total          39131     90461     2

lock_dlm/debug:

8587 un 3,203a6040 a006c 3 8
8587 un 3,20366048 1002f0 3 8
8587 un 3,201e6078 1a0007 3 8
8587 un 3,1fe460ec 1f01ba 3 8
8587 un 3,20276066 180140 3 8
8587 un 3,1fc76126 120120 3 8
8587 un 3,20106094 1003f7 3 8
8587 un 3,2045602a 12008b 3 8
8587 un 3,1fa4616c 150233 3 8
8587 un 3,2024606c 1501cf 3 8
8587 un 3,1ffe60b8 17039b 3 8
8587 un 3,1fc26130 1a0181 3 8
8587 un 3,1fb06154 1201c3 3 8
8587 un 3,1fcf6116 1301b9 3 8
8587 un 3,1fe160f2 170325 3 8
8587 un 3,202c605c c00bc 3 8
8587 un 3,1fa86164 1401b4 3 8
8587 un 3,203b603e 1600da 3 8
8587 un 3,200160b2 1c0000 3 8
8587 un 3,1fed60da 1a0134 3 8
8587 un 3,1fa96162 1b0273 3 8
8587 un 3,204e6018 1602cc 3 8
8587 un 3,204f6016 1502e4 3 8
8587 un 3,1fc5612a 1a0080 3 8
8587 un 3,1f966188 11010f 3 8
8587 un 3,2055600a 140272 3 8
8587 un 3,1fff60b6 150329 3 8
8587 un 3,1fb96142 1b0159 3 8
8587 un 3,1fee60d8 100331 3 8
8587 un 3,20206074 12036b 3 8
8587 un 3,1faa6160 1f00f1 3 8
8454 qc 3,1fa16172 3,3 id 1802c0 sts -65538 0
8454 qc 3,204b601e 3,3 id 18009f sts -65538 0
8454 qc 3,1fb66148 3,3 id 1a0384 sts -65538 0
8454 qc 3,1fac615c 3,3 id 150333 sts -65538 0
8454 qc 3,1fea60e0 3,3 id f020f sts -65538 0
8454 qc 3,2015608a 3,3 id 1301a1 sts -65538 0
8454 qc 3,20186084 3,3 id 13036a sts -65538 0
8454 qc 3,203a6040 3,3 id a006c sts -65538 0
8454 qc 3,20366048 3,3 id 1002f0 sts -65538 0
8454 qc 3,201e6078 3,3 id 1a0007 sts -65538 0
8454 qc 3,1fe460ec 3,3 id 1f01ba sts -65538 0
8454 qc 3,20276066 3,3 id 180140 sts -65538 0
8454 qc 3,1fc76126 3,3 id 120120 sts -65538 0
8454 qc 3,20106094 3,3 id 1003f7 sts -65538 0
8454 qc 3,2045602a 3,3 id 12008b sts -65538 0
8454 qc 3,1fa4616c 3,3 id 150233 sts -65538 0
8454 qc 3,2024606c 3,3 id 1501cf sts -65538 0
8454 qc 3,1ffe60b8 3,3 id 17039b sts -65538 0
8454 qc 3,1fc26130 3,3 id 1a0181 sts -65538 0
8454 qc 3,1fb06154 3,3 id 1201c3 sts -65538 0
8454 qc 3,1fcf6116 3,3 id 1301b9 sts -65538 0
8454 qc 3,1fe160f2 3,3 id 170325 sts -65538 0
8454 qc 3,202c605c 3,3 id c00bc sts -65538 0
8454 qc 3,1fa86164 3,3 id 1401b4 sts -65538 0
8454 qc 3,203b603e 3,3 id 1600da sts -65538 0
8454 qc 3,200160b2 3,3 id 1c0000 sts -65538 0
8454 qc 3,1fed60da 3,3 id 1a0134 sts -65538 0
8454 qc 3,1fa96162 3,3 id 1b0273 sts -65538 0
8454 qc 3,204e6018 3,3 id 1602cc sts -65538 0
8454 qc 3,204f6016 3,3 id 1502e4 sts -65538 0
8454 qc 3,1fc5612a 3,3 id 1a0080 sts -65538 0
8454 qc 3,1f966188 3,3 id 11010f sts -65538 0
8454 qc 3,2055600a 3,3 id 140272 sts -65538 0
8454 qc 3,1fff60b6 3,3 id 150329 sts -65538 0
8454 qc 3,1fb96142 3,3 id 1b0159 sts -65538 0
8454 qc 3,1fee60d8 3,3 id 100331 sts -65538 0
8454 qc 3,20206074 3,3 id 12036b sts -65538 0
8454 qc 3,1faa6160 3,3 id 1f00f1 sts -65538 0
9629 lk 2,1a id 0 -1,3 10000
8454 qc 2,1a -1,3 id 120152 sts 0 0
9629 lk 2,cefe67f id 0 -1,3 10001
9629 lk 5,cefe67f id 0 -1,3 1
9629 lk 2,19e16cf8 id 0 -1,3 10001
9629 lk 2,19e16cf7 id 0 -1,3 10001
9629 lk 5,19e16cf7 id 0 -1,3 1
9629 lk 2,136979e7 id 0 -1,3 10001
9629 lk 5,136979e7 id 0 -1,3 1
8454 qc 5,cefe67f -1,3 id 1402d1 sts 0 0
9629 lk 2,df id 0 -1,3 10001
9629 lk 5,df id 0 -1,3 1
8454 qc 5,136979e7 -1,3 id d025e sts 0 0
8454 qc 2,cefe67f -1,3 id 18020d sts -11 0
8454 qc 2,19e16cf8 -1,3 id e0068 sts -11 0
8454 qc 2,136979e7 -1,3 id 1c0330 sts -11 0
8454 qc 2,19e16cf7 -1,3 id 1e00d4 sts -11 0
8454 qc 5,19e16cf7 -1,3 id 1b02d3 sts 0 0
8584 lk 2,19e16cf8 id 0 -1,3 10000
8454 qc 5,df -1,3 id 1c00d3 sts 0 0
8454 qc 2,df -1,3 id 1201fc sts -11 0
8454 qc 2,19e16cf8 -1,3 id 170183 sts 0 0
8587 un 5,136979e7 d025e 3 0
8587 un 5,19e16cf7 1b02d3 3 0
8587 un 5,df 1c00d3 3 0
8587 un 5,cefe67f 1402d1 3 0
8454 qc 5,df 3,3 id 1c00d3 sts -65538 0
8454 qc 5,cefe67f 3,3 id 1402d1 sts -65538 0
8454 qc 5,136979e7 3,3 id d025e sts -65538 0
8454 qc 5,19e16cf7 3,3 id 1b02d3 sts -65538 0
8587 un 2,1a 120152 3 0
8454 qc 2,1a 3,3 id 120152 sts -65538 0
8587 un 2,19e16cf8 170183 3 0
8454 qc 2,19e16cf8 3,3 id 170183 sts -65538 0
9667 lk 2,1a id 0 -1,3 10000
8454 qc 2,1a -1,3 id 1b0384 sts 0 0
9667 lk 2,8e9 id 0 -1,3 10000
8454 qc 2,8e9 -1,3 id 140334 sts 0 0
9667 lk 5,8e9 id 0 -1,3 0
8454 qc 5,8e9 -1,3 id 12003b sts 0 0

lock_dlm/drop_count:
50000

lock_dlm/drop_period:
60

lock_dlm/max_nodes:
128


morph-03:

dlm_debug:
ve flags 0,1,0 ids 25,27,25
vedder move use event 27
vedder recover event 27
vedder add node 4
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 27 done
vedder move flags 0,0,1 ids 25,27,27
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 27 finished
vedder move flags 1,0,0 ids 27,27,27
vedder move flags 0,1,0 ids 27,29,27
vedder move use event 29
vedder recover event 29
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 8 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 29 done
vedder move flags 0,0,1 ids 27,29,29
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 29 finished

dlm_dir:
dlm_locks:
dlm_stats:
DLM stats (HZ=1000)

Lock operations:      13516
Unlock operations:     4906
Convert operations:     944
Completion ASTs:      19365
Blocking ASTs:           27

Lockqueue        num  waittime   ave
WAIT_RSB       10825     48502     4
WAIT_CONV         15        15     1
WAIT_GRANT      8350      3822     0
WAIT_UNLOCK       39       113     2
Total          19229     52452     2

lock_dlm/debug:
6e8 7026e 5 0
8196 qc 2,cf0e6e8 5,5 id 7026e sts -65538 0
8329 un 2,cefe6ad 9001f 5 0
8196 qc 2,cefe6ad 5,5 id 9001f sts -65538 0
8329 un 2,cefe8a3 30364 5 0
8196 qc 2,cefe8a3 5,5 id 30364 sts -65538 0
8329 un 2,cf3e6a2 d02da 5 0
8196 qc 2,cf3e6a2 5,5 id d02da sts -65538 0
8329 un 2,cefe84a 503d5 5 0
8196 qc 2,cefe84a 5,5 id 503d5 sts -65538 0
8329 un 2,cefe6ae d02d3 5 0
8196 qc 2,cefe6ae 5,5 id d02d3 sts -65538 0
8329 un 2,cefe8d1 a0288 5 0
8196 qc 2,cefe8d1 5,5 id a0288 sts -65538 0
8329 un 2,cf1e681 a015c 5 0
8196 qc 2,cf1e681 5,5 id a015c sts -65538 0
8329 un 2,cf0e68a 90115 5 0
8196 qc 2,cf0e68a 5,5 id 90115 sts -65538 0
8329 un 2,cf0e738 1102ff 5 0
8196 qc 2,cf0e738 5,5 id 1102ff sts -65538 0
8329 un 2,cf0e6ce 50011 5 0
8196 qc 2,cf0e6ce 5,5 id 50011 sts -65538 0
8329 un 2,cf2e67a 90030 5 0
8196 qc 2,cf2e67a 5,5 id 90030 sts -65538 0
8329 un 2,cefe8ed c01db 5 0
8196 qc 2,cefe8ed 5,5 id c01db sts -65538 0
8329 un 2,cf0e698 a01db 5 0
8196 qc 2,cf0e698 5,5 id a01db sts -65538 0
8329 un 2,cefe896 d0367 5 0
8196 qc 2,cefe896 5,5 id d0367 sts -65538 0
8329 un 2,cefe90f c01fc 5 0
8196 qc 2,cefe90f 5,5 id c01fc sts -65538 0
8329 un 2,cf2e6b4 f03e0 5 0
8196 qc 2,cf2e6b4 5,5 id f03e0 sts -65538 0
8329 un 2,cefe854 501d1 5 0
8196 qc 2,cefe854 5,5 id 501d1 sts -65538 0
8329 un 2,cf0e6fb 80289 5 0
8196 qc 2,cf0e6fb 5,5 id 80289 sts -65538 0
8329 un 2,cf2e69b b02bb 5 0
8196 qc 2,cf2e69b 5,5 id b02bb sts -65538 0
8329 un 2,cefe89b 702da 5 0
8196 qc 2,cefe89b 5,5 id 702da sts -65538 0
8329 un 2,cf4e68c 10037d 5 0
8196 qc 2,cf4e68c 5,5 id 10037d sts -65538 0
8329 un 2,cefe8d0 90065 5 0
8196 qc 2,cefe8d0 5,5 id 90065 sts -65538 0
8329 un 2,cefe85e 8018a 5 0
8196 qc 2,cefe85e 5,5 id 8018a sts -65538 0
8329 un 2,cf0e6c8 b02e7 5 0
8196 qc 2,cf0e6c8 5,5 id b02e7 sts -65538 0
8329 un 2,cefe690 3026a 5 0
8196 qc 2,cefe690 5,5 id 3026a sts -65538 0
8329 un 2,cf4e680 140284 5 0
8196 qc 2,cf4e680 5,5 id 140284 sts -65538 0
8329 un 2,cf3e684 130092 5 0
8196 qc 2,cf3e684 5,5 id 130092 sts -65538 0
8329 un 2,cf1e6da 60138 5 0
8196 qc 2,cf1e6da 5,5 id 60138 sts -65538 0
8329 un 2,cefe879 e02b0 5 0
8196 qc 2,cefe879 5,5 id e02b0 sts -65538 0
8329 un 2,cefe887 90308 5 0
8196 qc 2,cefe887 5,5 id 90308 sts -65538 0
8329 un 2,cf1e6d2 e01a1 5 0
8196 qc 2,cf1e6d2 5,5 id e01a1 sts -65538 0
8329 un 2,cefe837 503d8 5 0
8196 qc 2,cefe837 5,5 id 503d8 sts -65538 0
8329 un 2,cefe6b0 10029a 5 0
8196 qc 2,cefe6b0 5,5 id 10029a sts -65538 0
8329 un 2,cefe8a6 600e4 5 0
8196 qc 2,cefe8a6 5,5 id 600e4 sts -65538 0
8329 un 2,cefe86e 701d8 5 0
8196 qc 2,cefe86e 5,5 id 701d8 sts -65538 0
8329 un 2,cf0e6c9 a02a7 5 0
8196 qc 2,cf0e6c9 5,5 id a02a7 sts -65538 0
8329 un 2,cf0e6b6 c014a 5 0
8196 qc 2,cf0e6b6 5,5 id c014a sts -65538 0
8329 un 2,cf0e6c5 901aa 5 0
8196 qc 2,cf0e6c5 5,5 id 901aa sts -65538 0
8329 un 2,cf0e6d2 e01dc 5 0
8196 qc 2,cf0e6d2 5,5 id e01dc sts -65538 0
8329 un 2,cefe8c1 7005b 5 0
8196 qc 2,cefe8c1 5,5 id 7005b sts -65538 0
8329 un 2,cf0e680 100135 5 0
8196 qc 2,cf0e680 5,5 id 100135 sts -65538 0
8329 un 2,cf0e6fa c0087 5 0
8196 qc 2,cf0e6fa 5,5 id c0087 sts -65538 0
8329 un 2,cf0e732 d028e 5 0
8196 qc 2,cf0e732 5,5 id d028e sts -65538 0
8329 un 2,cf0e68d 90378 5 0
8196 qc 2,cf0e68d 5,5 id 90378 sts -65538 0
8329 un 2,1a a013b 3 0
8196 qc 2,1a 3,3 id a013b sts -65538 0
8329 un 2,19e16cf8 40004 3 0
8196 qc 2,19e16cf8 3,3 id 40004 sts -65538 0
8329 un 2,cf0e736 f00cc 5 0
8196 qc 2,cf0e736 5,5 id f00cc sts -65538 0
8329 un 2,cf0e734 c03af 5 0
8196 qc 2,cf0e734 5,5 id c03af sts -65538 0
8329 un 2,cf0e735 d01fb 5 0
8196 qc 2,cf0e735 5,5 id d01fb sts -65538 0
8326 lk 3,cefe67a id a0221 5,3 d
8196 qc 3,cefe67a 5,3 id a0221 sts 0 0
8326 lk 3,cf0e678 id 11029a 5,3 d
8196 qc 3,cf0e678 5,3 id 11029a sts 0 0
8326 lk 3,cf1e676 id b01d0 5,3 d
8196 qc 3,cf1e676 5,3 id b01d0 sts 0 0
8326 lk 3,cf2e674 id e038d 5,3 d
8196 qc 3,cf2e674 5,3 id e038d sts 0 0
8326 lk 3,cf3e672 id d0365 5,3 d
8196 qc 3,cf3e672 5,3 id d0365 sts 0 0
8326 lk 3,cf4e670 id d029c 5,3 d
8196 qc 3,cf4e670 5,3 id d029c sts 0 0
8326 lk 3,cf5e66e id a0078 5,3 d
8196 qc 3,cf5e66e 5,3 id a0078 sts 0 0

lock_dlm/drop_count:
50000
lock_dlm/drop_period:
60
lock_dlm/max_nodes:
128


morph-04:
dlm_debug:
22
clvmd process held requests
clvmd processed 0 requests
clvmd resend marked requests
clvmd resent 0 requests
clvmd recover event 22 finished
vedder move flags 0,1,0 ids 0,23,0
vedder move use event 23
vedder recover event 23 (first)
vedder add nodes
vedder total nodes 4
vedder rebuild resource directory
vedder rebuilt 6 resources
vedder recover event 23 done
vedder move flags 0,0,1 ids 0,23,23
vedder process held requests
vedder processed 0 requests
vedder recover event 23 finished
vedder move flags 1,0,0 ids 23,23,23
vedder move flags 0,1,0 ids 23,25,23
vedder move use event 25
vedder recover event 25
vedder add node 5
vedder total nodes 5
vedder rebuild resource directory
vedder rebuilt 4 resources
vedder purge requests
vedder purged 0 requests
vedder mark waiting requests
vedder marked 0 requests
vedder recover event 25 done
vedder move flags 0,0,1 ids 23,25,25
vedder process held requests
vedder processed 0 requests
vedder resend marked requests
vedder resent 0 requests
vedder recover event 25 finished
dlm_dir:
dlm_locks:
dlm_stat:
DLM stats (HZ=1000)

Lock operations:      22499
Unlock operations:    13863
Convert operations:    4014
Completion ASTs:      40375
Blocking ASTs:           88

Lockqueue        num  waittime   ave
WAIT_RSB       11472     69914     6
WAIT_CONV         12         4     0
WAIT_GRANT     16690     12093     0
WAIT_UNLOCK     8379     12587     1
Total          36553     94598     2

lock_dlm/debug:
3 8
8320 un 3,1fcf6116 120345 3 8
8320 un 3,1fe160f2 1b00c5 3 8
8320 un 3,202c605c 18000b 3 8
8320 un 3,1fa86164 1c0024 3 8
8320 un 3,203b603e 1b030b 3 8
8320 un 3,200160b2 1803e9 3 8
8320 un 3,1fed60da 1b01de 3 8
8320 un 3,1fa96162 190223 3 8
8320 un 3,204e6018 12022f 3 8
8320 un 3,204f6016 12016e 3 8
8320 un 3,1fc5612a 1503b2 3 8
8320 un 3,2055600a d0314 3 8
8320 un 3,1fff60b6 13031a 3 8
8320 un 3,1fb96142 1502cb 3 8
8320 un 3,1fee60d8 180180 3 8
8320 un 3,20206074 1b02ca 3 8
8320 un 3,1faa6160 1a007f 3 8
8187 qc 3,20406034 3,3 id 1f0290 sts -65538 0
8187 qc 3,202f6056 3,3 id 2003a9 sts -65538 0
8187 qc 3,1fd3610e 3,3 id 1d033e sts -65538 0
8187 qc 3,200d609a 3,3 id 1c0335 sts -65538 0
8187 qc 3,201c607c 3,3 id 1a021e sts -65538 0
8187 qc 3,20396042 3,3 id 1a0306 sts -65538 0
8187 qc 3,1fb16152 3,3 id 1d0363 sts -65538 0
8187 qc 3,1ff760c6 3,3 id 150001 sts -65538 0
8187 qc 3,1fb5614a 3,3 id 1402f0 sts -65538 0
8187 qc 3,201b607e 3,3 id 170168 sts -65538 0
8187 qc 3,20486024 3,3 id 140204 sts -65538 0
8187 qc 3,1faf6156 3,3 id 1600e9 sts -65538 0
8187 qc 3,1fc4612c 3,3 id 180016 sts -65538 0
8187 qc 3,1fde60f8 3,3 id 1d038a sts -65538 0
8187 qc 3,200f6096 3,3 id 14013f sts -65538 0
8187 qc 3,203f6036 3,3 id 180339 sts -65538 0
8187 qc 3,1fb26150 3,3 id 1f0120 sts -65538 0
8187 qc 3,1fd96102 3,3 id 20013a sts -65538 0
8187 qc 3,1fca6120 3,3 id 1901a8 sts -65538 0
8187 qc 3,1fab615e 3,3 id 1d0352 sts -65538 0
8187 qc 3,1ff360ce 3,3 id 1f032e sts -65538 0
8187 qc 3,1fb76146 3,3 id 1b01d2 sts -65538 0
8187 qc 3,2053600e 3,3 id 190395 sts -65538 0
8187 qc 3,1fe760e6 3,3 id 130246 sts -65538 0
8187 qc 3,1fb4614c 3,3 id 1001e3 sts -65538 0
8187 qc 3,20316052 3,3 id 190057 sts -65538 0
8187 qc 3,1ff260d0 3,3 id 1b0131 sts -65538 0
8187 qc 3,20306054 3,3 id 15012d sts -65538 0
8187 qc 3,1fdf60f6 3,3 id 12012a sts -65538 0
8187 qc 3,201d607a 3,3 id 1e03ae sts -65538 0
8187 qc 3,1ff960c2 3,3 id 1803ab sts -65538 0
8187 qc 3,1fd5610a 3,3 id 130321 sts -65538 0
8187 qc 3,1fad615a 3,3 id 16038d sts -65538 0
8187 qc 3,204d601a 3,3 id 1303da sts -65538 0
8187 qc 3,1fb3614e 3,3 id 190002 sts -65538 0
8187 qc 3,203e6038 3,3 id 100269 sts -65538 0
8187 qc 3,2043602e 3,3 id 190106 sts -65538 0
8187 qc 3,2023606e 3,3 id a02ee sts -65538 0
8187 qc 3,1ffc60bc 3,3 id 140000 sts -65538 0
8187 qc 3,1fcb611e 3,3 id f03d2 sts -65538 0
8187 qc 3,20526010 3,3 id 1a0167 sts -65538 0
8187 qc 3,200e6098 3,3 id 1a0169 sts -65538 0
8187 qc 3,200260b0 3,3 id e02cc sts -65538 0
8187 qc 3,204b601e 3,3 id 15019d sts -65538 0
8187 qc 3,1fb66148 3,3 id 1d00bc sts -65538 0
8187 qc 3,1fac615c 3,3 id 230174 sts -65538 0
8187 qc 3,1fea60e0 3,3 id 180266 sts -65538 0
8187 qc 3,2015608a 3,3 id 12036f sts -65538 0
8187 qc 3,20186084 3,3 id e00cb sts -65538 0
8187 qc 3,203a6040 3,3 id 1601ee sts -65538 0
8187 qc 3,20366048 3,3 id 16026a sts -65538 0
8187 qc 3,201e6078 3,3 id 1202da sts -65538 0
8187 qc 3,1fe460ec 3,3 id 1300e2 sts -65538 0
8187 qc 3,20276066 3,3 id 1200b7 sts -65538 0
8187 qc 3,1fc76126 3,3 id 180043 sts -65538 0
8187 qc 3,20106094 3,3 id 140213 sts -65538 0
8187 qc 3,2045602a 3,3 id 1100ac sts -65538 0
8187 qc 3,2024606c 3,3 id 1a00f9 sts -65538 0
8187 qc 3,1ffe60b8 3,3 id 160168 sts -65538 0
8187 qc 3,1fc26130 3,3 id 1a0287 sts -65538 0
8187 qc 3,1fb06154 3,3 id 1701f3 sts -65538 0
8187 qc 3,1fcf6116 3,3 id 120345 sts -65538 0
8187 qc 3,1fe160f2 3,3 id 1b00c5 sts -65538 0
8187 qc 3,202c605c 3,3 id 18000b sts -65538 0
8187 qc 3,1fa86164 3,3 id 1c0024 sts -65538 0
8187 qc 3,203b603e 3,3 id 1b030b sts -65538 0
8187 qc 3,200160b2 3,3 id 1803e9 sts -65538 0
8187 qc 3,1fed60da 3,3 id 1b01de sts -65538 0
8187 qc 3,1fa96162 3,3 id 190223 sts -65538 0
8187 qc 3,204e6018 3,3 id 12022f sts -65538 0
8187 qc 3,204f6016 3,3 id 12016e sts -65538 0
8187 qc 3,1fc5612a 3,3 id 1503b2 sts -65538 0
8187 qc 3,2055600a 3,3 id d0314 sts -65538 0
8187 qc 3,1fff60b6 3,3 id 13031a sts -65538 0
8187 qc 3,1fb96142 3,3 id 1502cb sts -65538 0
8187 qc 3,1fee60d8 3,3 id 180180 sts -65538 0
8187 qc 3,20206074 3,3 id 1b02ca sts -65538 0
8187 qc 3,1faa6160 3,3 id 1a007f sts -65538 0

lock_dlm/drop_count:
50000
lock_dlm/drop_period:
60
lock_dlm/max_nodes:
128






FWIW --  morph-01 panic'ed while I was gathering up the dlm debug info.

The fs hung up early last night, I was gathering info this morning,
so it was hung for 8+ hours.

Unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
c014a340
*pde = f5c6f067
Oops: 0002 [#1]
Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_dlm(U) dlm(U)
cman(U) lock
_harness(U) parport_pc lp parport autofs4 md5 ipv6 sunrpc uhci_hcd
hw_random e10
00 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300
qla2xxx scsi_tra
nsport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c014a340>]    Tainted: GF     VLI
EFLAGS: 00010092   (2.6.9-5.EL)
EIP is at cache_alloc_refill+0x146/0x227
eax: 00000000   ebx: c32e4b80   ecx: c32e4b00   edx: c32e4b8c
esi: 00000010   edi: c32e4b8c   ebp: c32e8000   esp: f5b24efc
ds: 007b   es: 007b   ss: 0068
Process atd (pid: 2200, threadinfo=f5b24000 task=f5b3e230)
Stack: 00000050 00000050 00000246 c32e4b80 f5b2ebf4 c014a663 00000002
c3123600
       00000000 f8862694 00000000 f5b24000 f886271b 00000001 00000000
f5b2ebf4
       f88e2497 00000001 f5b2ebf4 f5b2ec30 c018812c 001d31db 00000000
c333c400
Call Trace:
 [<c014a663>] kmem_cache_alloc+0x46/0x4c
 [<f8862694>] new_handle+0x15/0x40 [jbd]
 [<f886271b>] journal_start+0x5c/0x9e [jbd]
 [<f88e2497>] ext3_dirty_inode+0x24/0x66 [ext3]
 [<c018812c>] __mark_inode_dirty+0x28/0x23b
 [<c0176346>] filldir64+0x0/0x11a
 [<c0180a8b>] update_atime+0x6a/0x90
 [<c0176091>] vfs_readdir+0x9d/0xb7
 [<c01764c5>] sys_getdents64+0x65/0x9f
 [<c0301bfb>] syscall_call+0x7/0xb
Code: af 43 34 03 41 0c 89 44 95 10 ff 45 00 8b 51 10 0f b7 41 14 42
89 51 10 0f
 b7 44 41 18 66 89 41 14 3b 53 3c 72 cc 8b 51 04 8b 01 <89> 50 04 89
02 66 83 79
 14 ff c7 01 00 01 10 00 c7 41 04 00 02
 <0>Fatal exception: panic in 5 seconds
/Kernel panic - not syncing: mm/slab.c:1984:
spin_lock(mm/slab.c:c32e4bc4) alrea
dy locked by mm/slab.c/1984

Comment 1 Dean Jansa 2005-02-10 15:53:59 UTC
Dave, if this looks more like a GFS issue, please reassign.  I had to
pick one, so I'm picking on you.  :)

Versions of the modules:

DLM 2.6.9-18.0 (built Feb  9 2005 14:56:57)
Lock_DLM (built Feb  9 2005 15:07:12)
GFS 2.6.9-18.3 (built Feb  9 2005 15:07:30)
CMAN 2.6.9-17.2 (built Feb  9 2005 14:52:26)
Lock_Harness 2.6.9-18.3 (built Feb  9 2005 15:07:09)

Comment 2 David Teigland 2005-02-10 16:07:56 UTC
I'm guessing this is a plock/flock problem.  A dump of the dlm
locks would help here:  
echo "name of lockspace" >> /proc/cluster/dlm_locks
cat /proc/cluster/dlm_locks > locks.txt

Is there a "quick" way for me to run this load on my
machines?


Comment 3 Dean Jansa 2005-02-10 16:14:18 UTC
Hmm, all of the /proc/cluster/dlm_locks are empty....

Comment 4 Dean Jansa 2005-02-10 16:18:00 UTC
As for the quick was to run the load....

You have the sistina-test tree correct?  (You run revolver if I recall)

You can run sistina-test/vedder/bin/vedder -R <your cluster resource
file> -l <path to sistina-test root> -S QUICK

For example I ran: 
vedder -R ../../var/share/resource_files/morph-cluster.xml -l
~/src/sistina-test -S QUICK 

Having said that...  Not sure if you will hit it, I have not tried to
reproduce it yet.



Comment 5 Dean Jansa 2005-02-10 16:28:42 UTC
Oops, the dlm_locks are not empty.  Have to paste the correct
lockspace name...  I'm gathering.

Comment 6 Dean Jansa 2005-02-10 16:37:07 UTC
Dave, you can find the dlm_lock output from each node at:

/home/msp/djansa/pub/bugs/147682/morph*.dlm_locks



Comment 7 David Teigland 2005-02-11 10:00:34 UTC
The lock dump shows one problem that I've just checked in a fix for.
It was related to the quota lock.  I don't know if it explains the
hang, though; would probably need a kdb trace to know for sure.


Comment 8 David Teigland 2005-02-21 03:36:03 UTC
Neither Dean nor I have been able to reproduce this since the
fix mentioned above.  That could indicate that the problem is solved,
the problem is difficult to reproduce or both.



Note You need to log in before you can comment on or make changes to this bug.