Single gfs filesystem, mounted on /mnt/vedder, while running the 'QUICK' IO load the fs appears hung. 5 node cluster. (morph-01 -- morph-05) Not all operations hang however. [root@morph-01 ~]# touch /mnt/vedder/foo [root@morph-01 ~]# ls /mnt/vedder/foo /mnt/vedder/foo [root@morph-01 ~]# cd /mnt/vedder [root@morph-01 vedder]# ls foo foo (As an aside, foo shows up on the other nodes as well, I can do a ls /mnt/vedder/foo on morph-02 and it works) But, trying a ls of a dir I know is in there: [root@morph-01 vedder]# ls d_io ...hang... Also a simple ls hangs: [root@morph-01 ~]# cd /mnt/vedder [root@morph-01 vedder]# ls ...hang... All of the outstanding IO load processes on the nodes are stuck -- morph-01 was running: 0 S 500 10963 10962 0 82 0 - 547 wait Feb09 ? 00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock 1 D 500 10968 10963 0 78 0 - 550 glock_ Feb09 ? 00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock 1 D 500 10969 10963 0 78 0 - 550 glock_ Feb09 ? 00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock 1 D 500 10970 10963 0 78 0 - 550 glock_ Feb09 ? 00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock 1 D 500 10971 10963 0 78 0 - 550 glock_ Feb09 ? 00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock 1 D 500 10972 10963 0 78 0 - 550 glock_ Feb09 ? 00:00:00 genesis -i 10s -n 100 -d 10 -p 5 -L flock 0 S 500 11082 11081 0 81 0 - 1215 wait Feb09 ? 00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b 1000b:lock3_small | doio -av 1 D 500 11084 11082 0 81 0 - 1215 glock_ Feb09 ? 00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b 1000b:lock3_small | doio -av morph-02 was running: 0 S 500 8950 8949 0 77 0 - 1083 wait Feb09 ? 00:00:00 sh -c iogen -i 10s -m random -o -t 1b -T 1000b 1000b:lock2_small | doio -av 1 D 500 8952 8950 0 77 0 - 1083 glock_ Feb09 ? 00:00:00 sh -c iogen -i 10s -m random -o -t 1b -T 1000b 1000b:lock2_small | doio -av morph-03 was running: 0 S 500 8695 8694 0 77 0 - 1208 wait Feb09 ? 00:00:00 sh -c iogen -i 10s -m sequential -o -t 1b -T 1000b 1000b:lock1_small | doio -av 1 D 500 8697 8695 0 77 0 - 1208 glock_ Feb09 ? 00:00:00 sh -c iogen -i 10s -m sequential -o -t 1b -T 1000b 1000b:lock1_small | doio -av morph-04 was running: 0 S 500 8683 8682 0 76 0 - 1084 wait Feb09 ? 00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b 1000b:lock3_small | doio -av 1 D 500 8685 8683 0 76 0 - 1084 glock_ Feb09 ? 00:00:00 sh -c iogen -i 10s -m reverse -o -t 1b -T 1000b 1000b:lock3_small | doio -av morph-05 was not running any IO at the point of the hang. All of the dlm info from /proc/cluster morph-01: dlm_debug: ve flags 0,1,0 ids 33,35,33 vedder move use event 35 vedder recover event 35 vedder add node 4 vedder total nodes 4 vedder rebuild resource directory vedder rebuilt 7 resources vedder purge requests vedder purged 0 requests vedder mark waiting requests vedder marked 0 requests vedder recover event 35 done vedder move flags 0,0,1 ids 33,35,35 vedder process held requests vedder processed 0 requests vedder resend marked requests vedder resent 0 requests vedder recover event 35 finished vedder move flags 1,0,0 ids 35,35,35 vedder move flags 0,1,0 ids 35,37,35 vedder move use event 37 vedder recover event 37 vedder add node 5 vedder total nodes 5 vedder rebuild resource directory vedder rebuilt 4 resources vedder purge requests vedder purged 0 requests vedder mark waiting requests vedder marked 0 requests vedder recover event 37 done vedder move flags 0,0,1 ids 35,37,37 vedder process held requests vedder processed 0 requests vedder resend marked requests vedder resent 0 requests vedder recover event 37 finished dlm_dir: dlm_locks: dlm_stats: DLM stats (HZ=1000) Lock operations: 22390 Unlock operations: 9254 Convert operations: 15014 Completion ASTs: 46657 Blocking ASTs: 4514 Lockqueue num waittime ave WAIT_RSB 11230 63596 5 WAIT_CONV 1 0 0 WAIT_GRANT 63 104 1 WAIT_UNLOCK 30 10 0 Total 11324 63710 5 lock_dlm/debug: 8 3 0 10705 un 5,677f36f 44003f 3 0 10408 qc 5,136979e7 3,3 id 660086 sts -65538 0 10408 qc 5,677f36f 3,3 id 44003f sts -65538 0 10408 qc 5,cefe67f 3,3 id 4e0298 sts -65538 0 11864 lk 2,cefe67f id 0 -1,3 10001 11864 lk 5,cefe67f id 0 -1,3 1 11864 lk 2,677f36f id 0 -1,3 10001 11864 lk 5,677f36f id 0 -1,3 1 11864 lk 2,19e16cf7 id 0 -1,3 10001 11864 lk 5,19e16cf7 id 0 -1,3 1 11864 lk 2,136979e7 id 0 -1,3 10001 11864 lk 5,136979e7 id 0 -1,3 1 10408 qc 2,677f36f -1,3 id 5001ac sts -11 0 10408 qc 2,cefe67f -1,3 id 590367 sts -11 0 10408 qc 5,19e16cf7 -1,3 id 520248 sts 0 0 10408 qc 5,cefe67f -1,3 id 4d0322 sts 0 0 10408 qc 5,677f36f -1,3 id 5f031d sts 0 0 10408 qc 2,136979e7 -1,3 id 560016 sts -11 0 10408 qc 5,136979e7 -1,3 id 57039e sts 0 0 10408 qc 2,19e16cf7 -1,3 id 4d00ad sts -11 0 11865 lk 2,cefe67f id 0 -1,3 10001 11865 lk 2,677f36f id 0 -1,3 10001 11865 lk 2,19e16cf7 id 0 -1,3 10001 11865 lk 2,136979e7 id 0 -1,3 10001 10408 qc 2,677f36f -1,3 id 4901cc sts -11 0 10408 qc 2,cefe67f -1,3 id 5100be sts -11 0 10408 qc 2,19e16cf7 -1,3 id 4b030f sts -11 0 10408 qc 2,136979e7 -1,3 id 4c03c7 sts -11 0 10705 un 5,19e16cf7 520248 3 0 10408 qc 5,19e16cf7 3,3 id 520248 sts -65538 0 10705 un 5,136979e7 57039e 3 0 10705 un 5,cefe67f 4d0322 3 0 10705 un 5,677f36f 5f031d 3 0 10408 qc 5,136979e7 3,3 id 57039e sts -65538 0 10408 qc 5,cefe67f 3,3 id 4d0322 sts -65538 0 10408 qc 5,677f36f 3,3 id 5f031d sts -65538 0 11901 lk 2,cefe67f id 0 -1,3 10001 11901 lk 5,cefe67f id 0 -1,3 1 11901 lk 2,677f36f id 0 -1,3 10001 11901 lk 5,677f36f id 0 -1,3 1 11901 lk 2,19e16cf7 id 0 -1,3 10001 11901 lk 5,19e16cf7 id 0 -1,3 1 11901 lk 2,136979e7 id 0 -1,3 10001 11901 lk 5,136979e7 id 0 -1,3 1 10408 qc 2,cefe67f -1,3 id 5800b7 sts -11 0 10408 qc 2,677f36f -1,3 id 3702d8 sts -11 0 10408 qc 2,19e16cf7 -1,3 id 4703b2 sts -11 0 10408 qc 5,677f36f -1,3 id 5c01b4 sts 0 0 10408 qc 5,cefe67f -1,3 id 5101b6 sts 0 0 10408 qc 5,19e16cf7 -1,3 id 4a0089 sts 0 0 10408 qc 2,136979e7 -1,3 id 4c038a sts -11 0 10408 qc 5,136979e7 -1,3 id 4e0211 sts 0 0 10705 un 5,136979e7 4e0211 3 0 10705 un 5,19e16cf7 4a0089 3 0 10705 un 5,cefe67f 5101b6 3 0 10705 un 5,677f36f 5c01b4 3 0 10408 qc 5,136979e7 3,3 id 4e0211 sts -65538 0 10408 qc 5,cefe67f 3,3 id 5101b6 sts -65538 0 10408 qc 5,19e16cf7 3,3 id 4a0089 sts -65538 0 10408 qc 5,677f36f 3,3 id 5c01b4 sts -65538 0 10703 lk 2,1a id 4f00e3 5,3 5 10408 qc 2,1a 5,3 id 4f00e3 sts 0 0 10703 lk 2,19e16cf8 id 650002 5,3 5 10408 qc 2,19e16cf8 5,3 id 650002 sts 0 0 10705 un 2,1a 4f00e3 3 0 10408 qc 2,1a 3,3 id 4f00e3 sts -65538 0 10705 un 2,19e16cf8 650002 3 0 10408 qc 2,19e16cf8 3,3 id 650002 sts -65538 0 11909 lk 2,1a id 0 -1,3 10000 10408 qc 2,1a -1,3 id 45023e sts 0 0 11909 lk 2,1a id 45023e 3,5 44 10408 qc 2,1a 3,5 id 45023e sts 0 0 11909 lk 3,11 id f03eb 3,5 d 10408 qc 3,11 3,5 id f03eb sts 0 0 11909 lk 2,8e9 id 0 -1,5 0 10408 qc 2,8e9 -1,5 id 4c02d2 sts 0 0 11909 lk 5,8e9 id 0 -1,3 0 10408 qc 5,8e9 -1,3 id 4a02c8 sts 0 0 11913 lk 2,cefe67f id 0 -1,3 10001 11913 lk 5,cefe67f id 0 -1,3 1 11913 lk 2,677f36f id 0 -1,3 10001 11913 lk 5,677f36f id 0 -1,3 1 11913 lk 2,19e16cf8 id 0 -1,3 10001 10408 qc 2,19e16cf8 -1,3 id 500154 sts 0 0 11913 lk 2,19e16cf7 id 0 -1,3 10001 11913 lk 5,19e16cf7 id 0 -1,3 1 11913 lk 2,136979e7 id 0 -1,3 10001 11913 lk 5,136979e7 id 0 -1,3 1 10408 qc 2,677f36f -1,3 id 5d01b4 sts -11 0 10408 qc 2,cefe67f -1,3 id 4e0319 sts -11 0 10408 qc 5,677f36f -1,3 id 450372 sts 0 0 10408 qc 5,cefe67f -1,3 id 6203ad sts 0 0 10408 qc 2,136979e7 -1,3 id 5902ce sts -11 0 10408 qc 2,19e16cf7 -1,3 id 40013c sts -11 0 10408 qc 5,136979e7 -1,3 id 47020d sts 0 0 10408 qc 5,19e16cf7 -1,3 id 4e0129 sts 0 0 10705 un 5,19e16cf7 4e0129 3 0 10408 qc 5,19e16cf7 3,3 id 4e0129 sts -65538 0 10705 un 5,136979e7 47020d 3 0 10705 un 5,cefe67f 6203ad 3 0 10705 un 5,677f36f 450372 3 0 10408 qc 5,cefe67f 3,3 id 6203ad sts -65538 0 10408 qc 5,677f36f 3,3 id 450372 sts -65538 0 10408 qc 5,136979e7 3,3 id 47020d sts -65538 0 10702 lk 2,1a id 45023e 5,3 5 10408 qc 2,1a 5,3 id 45023e sts 0 0 10702 lk 2,8e9 id 4c02d2 5,3 5 10408 qc 2,8e9 5,3 id 4c02d2 sts 0 0 lock_dlm/drop_count: 50000 lock_dlm/drop_period: 60 lock_dlm/max_nodes: 128 morph-02: dlm_debug: ve flags 0,1,0 ids 29,31,29 vedder move use event 31 vedder recover event 31 vedder add node 4 vedder total nodes 4 vedder rebuild resource directory vedder rebuilt 5 resources vedder purge requests vedder purged 0 requests vedder mark waiting requests vedder marked 0 requests vedder recover event 31 done vedder move flags 0,0,1 ids 29,31,31 vedder process held requests vedder processed 0 requests vedder resend marked requests vedder resent 0 requests vedder recover event 31 finished vedder move flags 1,0,0 ids 31,31,31 vedder move flags 0,1,0 ids 31,33,31 vedder move use event 33 vedder recover event 33 vedder add node 5 vedder total nodes 5 vedder rebuild resource directory vedder rebuilt 4 resources vedder purge requests vedder purged 0 requests vedder mark waiting requests vedder marked 0 requests vedder recover event 33 done vedder move flags 0,0,1 ids 31,33,33 vedder process held requests vedder processed 0 requests vedder resend marked requests vedder resent 0 requests vedder recover event 33 finished dlm_dir: dlm_locks: dlm_stats: DLM stats (HZ=1000) Lock operations: 21161 Unlock operations: 8722 Convert operations: 11093 Completion ASTs: 40975 Blocking ASTs: 3786 Lockqueue num waittime ave WAIT_RSB 10389 64487 6 WAIT_CONV 7536 6022 0 WAIT_GRANT 16641 7954 0 WAIT_UNLOCK 4565 11998 2 Total 39131 90461 2 lock_dlm/debug: 8587 un 3,203a6040 a006c 3 8 8587 un 3,20366048 1002f0 3 8 8587 un 3,201e6078 1a0007 3 8 8587 un 3,1fe460ec 1f01ba 3 8 8587 un 3,20276066 180140 3 8 8587 un 3,1fc76126 120120 3 8 8587 un 3,20106094 1003f7 3 8 8587 un 3,2045602a 12008b 3 8 8587 un 3,1fa4616c 150233 3 8 8587 un 3,2024606c 1501cf 3 8 8587 un 3,1ffe60b8 17039b 3 8 8587 un 3,1fc26130 1a0181 3 8 8587 un 3,1fb06154 1201c3 3 8 8587 un 3,1fcf6116 1301b9 3 8 8587 un 3,1fe160f2 170325 3 8 8587 un 3,202c605c c00bc 3 8 8587 un 3,1fa86164 1401b4 3 8 8587 un 3,203b603e 1600da 3 8 8587 un 3,200160b2 1c0000 3 8 8587 un 3,1fed60da 1a0134 3 8 8587 un 3,1fa96162 1b0273 3 8 8587 un 3,204e6018 1602cc 3 8 8587 un 3,204f6016 1502e4 3 8 8587 un 3,1fc5612a 1a0080 3 8 8587 un 3,1f966188 11010f 3 8 8587 un 3,2055600a 140272 3 8 8587 un 3,1fff60b6 150329 3 8 8587 un 3,1fb96142 1b0159 3 8 8587 un 3,1fee60d8 100331 3 8 8587 un 3,20206074 12036b 3 8 8587 un 3,1faa6160 1f00f1 3 8 8454 qc 3,1fa16172 3,3 id 1802c0 sts -65538 0 8454 qc 3,204b601e 3,3 id 18009f sts -65538 0 8454 qc 3,1fb66148 3,3 id 1a0384 sts -65538 0 8454 qc 3,1fac615c 3,3 id 150333 sts -65538 0 8454 qc 3,1fea60e0 3,3 id f020f sts -65538 0 8454 qc 3,2015608a 3,3 id 1301a1 sts -65538 0 8454 qc 3,20186084 3,3 id 13036a sts -65538 0 8454 qc 3,203a6040 3,3 id a006c sts -65538 0 8454 qc 3,20366048 3,3 id 1002f0 sts -65538 0 8454 qc 3,201e6078 3,3 id 1a0007 sts -65538 0 8454 qc 3,1fe460ec 3,3 id 1f01ba sts -65538 0 8454 qc 3,20276066 3,3 id 180140 sts -65538 0 8454 qc 3,1fc76126 3,3 id 120120 sts -65538 0 8454 qc 3,20106094 3,3 id 1003f7 sts -65538 0 8454 qc 3,2045602a 3,3 id 12008b sts -65538 0 8454 qc 3,1fa4616c 3,3 id 150233 sts -65538 0 8454 qc 3,2024606c 3,3 id 1501cf sts -65538 0 8454 qc 3,1ffe60b8 3,3 id 17039b sts -65538 0 8454 qc 3,1fc26130 3,3 id 1a0181 sts -65538 0 8454 qc 3,1fb06154 3,3 id 1201c3 sts -65538 0 8454 qc 3,1fcf6116 3,3 id 1301b9 sts -65538 0 8454 qc 3,1fe160f2 3,3 id 170325 sts -65538 0 8454 qc 3,202c605c 3,3 id c00bc sts -65538 0 8454 qc 3,1fa86164 3,3 id 1401b4 sts -65538 0 8454 qc 3,203b603e 3,3 id 1600da sts -65538 0 8454 qc 3,200160b2 3,3 id 1c0000 sts -65538 0 8454 qc 3,1fed60da 3,3 id 1a0134 sts -65538 0 8454 qc 3,1fa96162 3,3 id 1b0273 sts -65538 0 8454 qc 3,204e6018 3,3 id 1602cc sts -65538 0 8454 qc 3,204f6016 3,3 id 1502e4 sts -65538 0 8454 qc 3,1fc5612a 3,3 id 1a0080 sts -65538 0 8454 qc 3,1f966188 3,3 id 11010f sts -65538 0 8454 qc 3,2055600a 3,3 id 140272 sts -65538 0 8454 qc 3,1fff60b6 3,3 id 150329 sts -65538 0 8454 qc 3,1fb96142 3,3 id 1b0159 sts -65538 0 8454 qc 3,1fee60d8 3,3 id 100331 sts -65538 0 8454 qc 3,20206074 3,3 id 12036b sts -65538 0 8454 qc 3,1faa6160 3,3 id 1f00f1 sts -65538 0 9629 lk 2,1a id 0 -1,3 10000 8454 qc 2,1a -1,3 id 120152 sts 0 0 9629 lk 2,cefe67f id 0 -1,3 10001 9629 lk 5,cefe67f id 0 -1,3 1 9629 lk 2,19e16cf8 id 0 -1,3 10001 9629 lk 2,19e16cf7 id 0 -1,3 10001 9629 lk 5,19e16cf7 id 0 -1,3 1 9629 lk 2,136979e7 id 0 -1,3 10001 9629 lk 5,136979e7 id 0 -1,3 1 8454 qc 5,cefe67f -1,3 id 1402d1 sts 0 0 9629 lk 2,df id 0 -1,3 10001 9629 lk 5,df id 0 -1,3 1 8454 qc 5,136979e7 -1,3 id d025e sts 0 0 8454 qc 2,cefe67f -1,3 id 18020d sts -11 0 8454 qc 2,19e16cf8 -1,3 id e0068 sts -11 0 8454 qc 2,136979e7 -1,3 id 1c0330 sts -11 0 8454 qc 2,19e16cf7 -1,3 id 1e00d4 sts -11 0 8454 qc 5,19e16cf7 -1,3 id 1b02d3 sts 0 0 8584 lk 2,19e16cf8 id 0 -1,3 10000 8454 qc 5,df -1,3 id 1c00d3 sts 0 0 8454 qc 2,df -1,3 id 1201fc sts -11 0 8454 qc 2,19e16cf8 -1,3 id 170183 sts 0 0 8587 un 5,136979e7 d025e 3 0 8587 un 5,19e16cf7 1b02d3 3 0 8587 un 5,df 1c00d3 3 0 8587 un 5,cefe67f 1402d1 3 0 8454 qc 5,df 3,3 id 1c00d3 sts -65538 0 8454 qc 5,cefe67f 3,3 id 1402d1 sts -65538 0 8454 qc 5,136979e7 3,3 id d025e sts -65538 0 8454 qc 5,19e16cf7 3,3 id 1b02d3 sts -65538 0 8587 un 2,1a 120152 3 0 8454 qc 2,1a 3,3 id 120152 sts -65538 0 8587 un 2,19e16cf8 170183 3 0 8454 qc 2,19e16cf8 3,3 id 170183 sts -65538 0 9667 lk 2,1a id 0 -1,3 10000 8454 qc 2,1a -1,3 id 1b0384 sts 0 0 9667 lk 2,8e9 id 0 -1,3 10000 8454 qc 2,8e9 -1,3 id 140334 sts 0 0 9667 lk 5,8e9 id 0 -1,3 0 8454 qc 5,8e9 -1,3 id 12003b sts 0 0 lock_dlm/drop_count: 50000 lock_dlm/drop_period: 60 lock_dlm/max_nodes: 128 morph-03: dlm_debug: ve flags 0,1,0 ids 25,27,25 vedder move use event 27 vedder recover event 27 vedder add node 4 vedder total nodes 4 vedder rebuild resource directory vedder rebuilt 4 resources vedder purge requests vedder purged 0 requests vedder mark waiting requests vedder marked 0 requests vedder recover event 27 done vedder move flags 0,0,1 ids 25,27,27 vedder process held requests vedder processed 0 requests vedder resend marked requests vedder resent 0 requests vedder recover event 27 finished vedder move flags 1,0,0 ids 27,27,27 vedder move flags 0,1,0 ids 27,29,27 vedder move use event 29 vedder recover event 29 vedder add node 5 vedder total nodes 5 vedder rebuild resource directory vedder rebuilt 8 resources vedder purge requests vedder purged 0 requests vedder mark waiting requests vedder marked 0 requests vedder recover event 29 done vedder move flags 0,0,1 ids 27,29,29 vedder process held requests vedder processed 0 requests vedder resend marked requests vedder resent 0 requests vedder recover event 29 finished dlm_dir: dlm_locks: dlm_stats: DLM stats (HZ=1000) Lock operations: 13516 Unlock operations: 4906 Convert operations: 944 Completion ASTs: 19365 Blocking ASTs: 27 Lockqueue num waittime ave WAIT_RSB 10825 48502 4 WAIT_CONV 15 15 1 WAIT_GRANT 8350 3822 0 WAIT_UNLOCK 39 113 2 Total 19229 52452 2 lock_dlm/debug: 6e8 7026e 5 0 8196 qc 2,cf0e6e8 5,5 id 7026e sts -65538 0 8329 un 2,cefe6ad 9001f 5 0 8196 qc 2,cefe6ad 5,5 id 9001f sts -65538 0 8329 un 2,cefe8a3 30364 5 0 8196 qc 2,cefe8a3 5,5 id 30364 sts -65538 0 8329 un 2,cf3e6a2 d02da 5 0 8196 qc 2,cf3e6a2 5,5 id d02da sts -65538 0 8329 un 2,cefe84a 503d5 5 0 8196 qc 2,cefe84a 5,5 id 503d5 sts -65538 0 8329 un 2,cefe6ae d02d3 5 0 8196 qc 2,cefe6ae 5,5 id d02d3 sts -65538 0 8329 un 2,cefe8d1 a0288 5 0 8196 qc 2,cefe8d1 5,5 id a0288 sts -65538 0 8329 un 2,cf1e681 a015c 5 0 8196 qc 2,cf1e681 5,5 id a015c sts -65538 0 8329 un 2,cf0e68a 90115 5 0 8196 qc 2,cf0e68a 5,5 id 90115 sts -65538 0 8329 un 2,cf0e738 1102ff 5 0 8196 qc 2,cf0e738 5,5 id 1102ff sts -65538 0 8329 un 2,cf0e6ce 50011 5 0 8196 qc 2,cf0e6ce 5,5 id 50011 sts -65538 0 8329 un 2,cf2e67a 90030 5 0 8196 qc 2,cf2e67a 5,5 id 90030 sts -65538 0 8329 un 2,cefe8ed c01db 5 0 8196 qc 2,cefe8ed 5,5 id c01db sts -65538 0 8329 un 2,cf0e698 a01db 5 0 8196 qc 2,cf0e698 5,5 id a01db sts -65538 0 8329 un 2,cefe896 d0367 5 0 8196 qc 2,cefe896 5,5 id d0367 sts -65538 0 8329 un 2,cefe90f c01fc 5 0 8196 qc 2,cefe90f 5,5 id c01fc sts -65538 0 8329 un 2,cf2e6b4 f03e0 5 0 8196 qc 2,cf2e6b4 5,5 id f03e0 sts -65538 0 8329 un 2,cefe854 501d1 5 0 8196 qc 2,cefe854 5,5 id 501d1 sts -65538 0 8329 un 2,cf0e6fb 80289 5 0 8196 qc 2,cf0e6fb 5,5 id 80289 sts -65538 0 8329 un 2,cf2e69b b02bb 5 0 8196 qc 2,cf2e69b 5,5 id b02bb sts -65538 0 8329 un 2,cefe89b 702da 5 0 8196 qc 2,cefe89b 5,5 id 702da sts -65538 0 8329 un 2,cf4e68c 10037d 5 0 8196 qc 2,cf4e68c 5,5 id 10037d sts -65538 0 8329 un 2,cefe8d0 90065 5 0 8196 qc 2,cefe8d0 5,5 id 90065 sts -65538 0 8329 un 2,cefe85e 8018a 5 0 8196 qc 2,cefe85e 5,5 id 8018a sts -65538 0 8329 un 2,cf0e6c8 b02e7 5 0 8196 qc 2,cf0e6c8 5,5 id b02e7 sts -65538 0 8329 un 2,cefe690 3026a 5 0 8196 qc 2,cefe690 5,5 id 3026a sts -65538 0 8329 un 2,cf4e680 140284 5 0 8196 qc 2,cf4e680 5,5 id 140284 sts -65538 0 8329 un 2,cf3e684 130092 5 0 8196 qc 2,cf3e684 5,5 id 130092 sts -65538 0 8329 un 2,cf1e6da 60138 5 0 8196 qc 2,cf1e6da 5,5 id 60138 sts -65538 0 8329 un 2,cefe879 e02b0 5 0 8196 qc 2,cefe879 5,5 id e02b0 sts -65538 0 8329 un 2,cefe887 90308 5 0 8196 qc 2,cefe887 5,5 id 90308 sts -65538 0 8329 un 2,cf1e6d2 e01a1 5 0 8196 qc 2,cf1e6d2 5,5 id e01a1 sts -65538 0 8329 un 2,cefe837 503d8 5 0 8196 qc 2,cefe837 5,5 id 503d8 sts -65538 0 8329 un 2,cefe6b0 10029a 5 0 8196 qc 2,cefe6b0 5,5 id 10029a sts -65538 0 8329 un 2,cefe8a6 600e4 5 0 8196 qc 2,cefe8a6 5,5 id 600e4 sts -65538 0 8329 un 2,cefe86e 701d8 5 0 8196 qc 2,cefe86e 5,5 id 701d8 sts -65538 0 8329 un 2,cf0e6c9 a02a7 5 0 8196 qc 2,cf0e6c9 5,5 id a02a7 sts -65538 0 8329 un 2,cf0e6b6 c014a 5 0 8196 qc 2,cf0e6b6 5,5 id c014a sts -65538 0 8329 un 2,cf0e6c5 901aa 5 0 8196 qc 2,cf0e6c5 5,5 id 901aa sts -65538 0 8329 un 2,cf0e6d2 e01dc 5 0 8196 qc 2,cf0e6d2 5,5 id e01dc sts -65538 0 8329 un 2,cefe8c1 7005b 5 0 8196 qc 2,cefe8c1 5,5 id 7005b sts -65538 0 8329 un 2,cf0e680 100135 5 0 8196 qc 2,cf0e680 5,5 id 100135 sts -65538 0 8329 un 2,cf0e6fa c0087 5 0 8196 qc 2,cf0e6fa 5,5 id c0087 sts -65538 0 8329 un 2,cf0e732 d028e 5 0 8196 qc 2,cf0e732 5,5 id d028e sts -65538 0 8329 un 2,cf0e68d 90378 5 0 8196 qc 2,cf0e68d 5,5 id 90378 sts -65538 0 8329 un 2,1a a013b 3 0 8196 qc 2,1a 3,3 id a013b sts -65538 0 8329 un 2,19e16cf8 40004 3 0 8196 qc 2,19e16cf8 3,3 id 40004 sts -65538 0 8329 un 2,cf0e736 f00cc 5 0 8196 qc 2,cf0e736 5,5 id f00cc sts -65538 0 8329 un 2,cf0e734 c03af 5 0 8196 qc 2,cf0e734 5,5 id c03af sts -65538 0 8329 un 2,cf0e735 d01fb 5 0 8196 qc 2,cf0e735 5,5 id d01fb sts -65538 0 8326 lk 3,cefe67a id a0221 5,3 d 8196 qc 3,cefe67a 5,3 id a0221 sts 0 0 8326 lk 3,cf0e678 id 11029a 5,3 d 8196 qc 3,cf0e678 5,3 id 11029a sts 0 0 8326 lk 3,cf1e676 id b01d0 5,3 d 8196 qc 3,cf1e676 5,3 id b01d0 sts 0 0 8326 lk 3,cf2e674 id e038d 5,3 d 8196 qc 3,cf2e674 5,3 id e038d sts 0 0 8326 lk 3,cf3e672 id d0365 5,3 d 8196 qc 3,cf3e672 5,3 id d0365 sts 0 0 8326 lk 3,cf4e670 id d029c 5,3 d 8196 qc 3,cf4e670 5,3 id d029c sts 0 0 8326 lk 3,cf5e66e id a0078 5,3 d 8196 qc 3,cf5e66e 5,3 id a0078 sts 0 0 lock_dlm/drop_count: 50000 lock_dlm/drop_period: 60 lock_dlm/max_nodes: 128 morph-04: dlm_debug: 22 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 22 finished vedder move flags 0,1,0 ids 0,23,0 vedder move use event 23 vedder recover event 23 (first) vedder add nodes vedder total nodes 4 vedder rebuild resource directory vedder rebuilt 6 resources vedder recover event 23 done vedder move flags 0,0,1 ids 0,23,23 vedder process held requests vedder processed 0 requests vedder recover event 23 finished vedder move flags 1,0,0 ids 23,23,23 vedder move flags 0,1,0 ids 23,25,23 vedder move use event 25 vedder recover event 25 vedder add node 5 vedder total nodes 5 vedder rebuild resource directory vedder rebuilt 4 resources vedder purge requests vedder purged 0 requests vedder mark waiting requests vedder marked 0 requests vedder recover event 25 done vedder move flags 0,0,1 ids 23,25,25 vedder process held requests vedder processed 0 requests vedder resend marked requests vedder resent 0 requests vedder recover event 25 finished dlm_dir: dlm_locks: dlm_stat: DLM stats (HZ=1000) Lock operations: 22499 Unlock operations: 13863 Convert operations: 4014 Completion ASTs: 40375 Blocking ASTs: 88 Lockqueue num waittime ave WAIT_RSB 11472 69914 6 WAIT_CONV 12 4 0 WAIT_GRANT 16690 12093 0 WAIT_UNLOCK 8379 12587 1 Total 36553 94598 2 lock_dlm/debug: 3 8 8320 un 3,1fcf6116 120345 3 8 8320 un 3,1fe160f2 1b00c5 3 8 8320 un 3,202c605c 18000b 3 8 8320 un 3,1fa86164 1c0024 3 8 8320 un 3,203b603e 1b030b 3 8 8320 un 3,200160b2 1803e9 3 8 8320 un 3,1fed60da 1b01de 3 8 8320 un 3,1fa96162 190223 3 8 8320 un 3,204e6018 12022f 3 8 8320 un 3,204f6016 12016e 3 8 8320 un 3,1fc5612a 1503b2 3 8 8320 un 3,2055600a d0314 3 8 8320 un 3,1fff60b6 13031a 3 8 8320 un 3,1fb96142 1502cb 3 8 8320 un 3,1fee60d8 180180 3 8 8320 un 3,20206074 1b02ca 3 8 8320 un 3,1faa6160 1a007f 3 8 8187 qc 3,20406034 3,3 id 1f0290 sts -65538 0 8187 qc 3,202f6056 3,3 id 2003a9 sts -65538 0 8187 qc 3,1fd3610e 3,3 id 1d033e sts -65538 0 8187 qc 3,200d609a 3,3 id 1c0335 sts -65538 0 8187 qc 3,201c607c 3,3 id 1a021e sts -65538 0 8187 qc 3,20396042 3,3 id 1a0306 sts -65538 0 8187 qc 3,1fb16152 3,3 id 1d0363 sts -65538 0 8187 qc 3,1ff760c6 3,3 id 150001 sts -65538 0 8187 qc 3,1fb5614a 3,3 id 1402f0 sts -65538 0 8187 qc 3,201b607e 3,3 id 170168 sts -65538 0 8187 qc 3,20486024 3,3 id 140204 sts -65538 0 8187 qc 3,1faf6156 3,3 id 1600e9 sts -65538 0 8187 qc 3,1fc4612c 3,3 id 180016 sts -65538 0 8187 qc 3,1fde60f8 3,3 id 1d038a sts -65538 0 8187 qc 3,200f6096 3,3 id 14013f sts -65538 0 8187 qc 3,203f6036 3,3 id 180339 sts -65538 0 8187 qc 3,1fb26150 3,3 id 1f0120 sts -65538 0 8187 qc 3,1fd96102 3,3 id 20013a sts -65538 0 8187 qc 3,1fca6120 3,3 id 1901a8 sts -65538 0 8187 qc 3,1fab615e 3,3 id 1d0352 sts -65538 0 8187 qc 3,1ff360ce 3,3 id 1f032e sts -65538 0 8187 qc 3,1fb76146 3,3 id 1b01d2 sts -65538 0 8187 qc 3,2053600e 3,3 id 190395 sts -65538 0 8187 qc 3,1fe760e6 3,3 id 130246 sts -65538 0 8187 qc 3,1fb4614c 3,3 id 1001e3 sts -65538 0 8187 qc 3,20316052 3,3 id 190057 sts -65538 0 8187 qc 3,1ff260d0 3,3 id 1b0131 sts -65538 0 8187 qc 3,20306054 3,3 id 15012d sts -65538 0 8187 qc 3,1fdf60f6 3,3 id 12012a sts -65538 0 8187 qc 3,201d607a 3,3 id 1e03ae sts -65538 0 8187 qc 3,1ff960c2 3,3 id 1803ab sts -65538 0 8187 qc 3,1fd5610a 3,3 id 130321 sts -65538 0 8187 qc 3,1fad615a 3,3 id 16038d sts -65538 0 8187 qc 3,204d601a 3,3 id 1303da sts -65538 0 8187 qc 3,1fb3614e 3,3 id 190002 sts -65538 0 8187 qc 3,203e6038 3,3 id 100269 sts -65538 0 8187 qc 3,2043602e 3,3 id 190106 sts -65538 0 8187 qc 3,2023606e 3,3 id a02ee sts -65538 0 8187 qc 3,1ffc60bc 3,3 id 140000 sts -65538 0 8187 qc 3,1fcb611e 3,3 id f03d2 sts -65538 0 8187 qc 3,20526010 3,3 id 1a0167 sts -65538 0 8187 qc 3,200e6098 3,3 id 1a0169 sts -65538 0 8187 qc 3,200260b0 3,3 id e02cc sts -65538 0 8187 qc 3,204b601e 3,3 id 15019d sts -65538 0 8187 qc 3,1fb66148 3,3 id 1d00bc sts -65538 0 8187 qc 3,1fac615c 3,3 id 230174 sts -65538 0 8187 qc 3,1fea60e0 3,3 id 180266 sts -65538 0 8187 qc 3,2015608a 3,3 id 12036f sts -65538 0 8187 qc 3,20186084 3,3 id e00cb sts -65538 0 8187 qc 3,203a6040 3,3 id 1601ee sts -65538 0 8187 qc 3,20366048 3,3 id 16026a sts -65538 0 8187 qc 3,201e6078 3,3 id 1202da sts -65538 0 8187 qc 3,1fe460ec 3,3 id 1300e2 sts -65538 0 8187 qc 3,20276066 3,3 id 1200b7 sts -65538 0 8187 qc 3,1fc76126 3,3 id 180043 sts -65538 0 8187 qc 3,20106094 3,3 id 140213 sts -65538 0 8187 qc 3,2045602a 3,3 id 1100ac sts -65538 0 8187 qc 3,2024606c 3,3 id 1a00f9 sts -65538 0 8187 qc 3,1ffe60b8 3,3 id 160168 sts -65538 0 8187 qc 3,1fc26130 3,3 id 1a0287 sts -65538 0 8187 qc 3,1fb06154 3,3 id 1701f3 sts -65538 0 8187 qc 3,1fcf6116 3,3 id 120345 sts -65538 0 8187 qc 3,1fe160f2 3,3 id 1b00c5 sts -65538 0 8187 qc 3,202c605c 3,3 id 18000b sts -65538 0 8187 qc 3,1fa86164 3,3 id 1c0024 sts -65538 0 8187 qc 3,203b603e 3,3 id 1b030b sts -65538 0 8187 qc 3,200160b2 3,3 id 1803e9 sts -65538 0 8187 qc 3,1fed60da 3,3 id 1b01de sts -65538 0 8187 qc 3,1fa96162 3,3 id 190223 sts -65538 0 8187 qc 3,204e6018 3,3 id 12022f sts -65538 0 8187 qc 3,204f6016 3,3 id 12016e sts -65538 0 8187 qc 3,1fc5612a 3,3 id 1503b2 sts -65538 0 8187 qc 3,2055600a 3,3 id d0314 sts -65538 0 8187 qc 3,1fff60b6 3,3 id 13031a sts -65538 0 8187 qc 3,1fb96142 3,3 id 1502cb sts -65538 0 8187 qc 3,1fee60d8 3,3 id 180180 sts -65538 0 8187 qc 3,20206074 3,3 id 1b02ca sts -65538 0 8187 qc 3,1faa6160 3,3 id 1a007f sts -65538 0 lock_dlm/drop_count: 50000 lock_dlm/drop_period: 60 lock_dlm/max_nodes: 128 FWIW -- morph-01 panic'ed while I was gathering up the dlm debug info. The fs hung up early last night, I was gathering info this morning, so it was hung for 8+ hours. Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c014a340 *pde = f5c6f067 Oops: 0002 [#1] Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_dlm(U) dlm(U) cman(U) lock _harness(U) parport_pc lp parport autofs4 md5 ipv6 sunrpc uhci_hcd hw_random e10 00 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_tra nsport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<c014a340>] Tainted: GF VLI EFLAGS: 00010092 (2.6.9-5.EL) EIP is at cache_alloc_refill+0x146/0x227 eax: 00000000 ebx: c32e4b80 ecx: c32e4b00 edx: c32e4b8c esi: 00000010 edi: c32e4b8c ebp: c32e8000 esp: f5b24efc ds: 007b es: 007b ss: 0068 Process atd (pid: 2200, threadinfo=f5b24000 task=f5b3e230) Stack: 00000050 00000050 00000246 c32e4b80 f5b2ebf4 c014a663 00000002 c3123600 00000000 f8862694 00000000 f5b24000 f886271b 00000001 00000000 f5b2ebf4 f88e2497 00000001 f5b2ebf4 f5b2ec30 c018812c 001d31db 00000000 c333c400 Call Trace: [<c014a663>] kmem_cache_alloc+0x46/0x4c [<f8862694>] new_handle+0x15/0x40 [jbd] [<f886271b>] journal_start+0x5c/0x9e [jbd] [<f88e2497>] ext3_dirty_inode+0x24/0x66 [ext3] [<c018812c>] __mark_inode_dirty+0x28/0x23b [<c0176346>] filldir64+0x0/0x11a [<c0180a8b>] update_atime+0x6a/0x90 [<c0176091>] vfs_readdir+0x9d/0xb7 [<c01764c5>] sys_getdents64+0x65/0x9f [<c0301bfb>] syscall_call+0x7/0xb Code: af 43 34 03 41 0c 89 44 95 10 ff 45 00 8b 51 10 0f b7 41 14 42 89 51 10 0f b7 44 41 18 66 89 41 14 3b 53 3c 72 cc 8b 51 04 8b 01 <89> 50 04 89 02 66 83 79 14 ff c7 01 00 01 10 00 c7 41 04 00 02 <0>Fatal exception: panic in 5 seconds /Kernel panic - not syncing: mm/slab.c:1984: spin_lock(mm/slab.c:c32e4bc4) alrea dy locked by mm/slab.c/1984
Dave, if this looks more like a GFS issue, please reassign. I had to pick one, so I'm picking on you. :) Versions of the modules: DLM 2.6.9-18.0 (built Feb 9 2005 14:56:57) Lock_DLM (built Feb 9 2005 15:07:12) GFS 2.6.9-18.3 (built Feb 9 2005 15:07:30) CMAN 2.6.9-17.2 (built Feb 9 2005 14:52:26) Lock_Harness 2.6.9-18.3 (built Feb 9 2005 15:07:09)
I'm guessing this is a plock/flock problem. A dump of the dlm locks would help here: echo "name of lockspace" >> /proc/cluster/dlm_locks cat /proc/cluster/dlm_locks > locks.txt Is there a "quick" way for me to run this load on my machines?
Hmm, all of the /proc/cluster/dlm_locks are empty....
As for the quick was to run the load.... You have the sistina-test tree correct? (You run revolver if I recall) You can run sistina-test/vedder/bin/vedder -R <your cluster resource file> -l <path to sistina-test root> -S QUICK For example I ran: vedder -R ../../var/share/resource_files/morph-cluster.xml -l ~/src/sistina-test -S QUICK Having said that... Not sure if you will hit it, I have not tried to reproduce it yet.
Oops, the dlm_locks are not empty. Have to paste the correct lockspace name... I'm gathering.
Dave, you can find the dlm_lock output from each node at: /home/msp/djansa/pub/bugs/147682/morph*.dlm_locks
The lock dump shows one problem that I've just checked in a fix for. It was related to the quota lock. I don't know if it explains the hang, though; would probably need a kdb trace to know for sure.
Neither Dean nor I have been able to reproduce this since the fix mentioned above. That could indicate that the problem is solved, the problem is difficult to reproduce or both.