Created attachment 331778 [details] collected debug information for mount hang Description of problem: While running revolver during 5.3.z testing of GFS2, I hit a state where the mount on one node hung. Running gfs2_hangalyzer found for locks with waiters but no holders. Attached is the output of gfs2_hangalyzer, SysRq-T, dlm lock dump and glock dump. Version-Release number of selected component (if applicable): kernel-2.6.18-128.1.1.el5 gfs2-utils-0.1.53-1.el5_3.1 How reproducible: Unknown Steps to Reproduce: 1. run revolver and pray Actual results: Expected results: Additional info:
The attachment seems to be corrupt: [steve@dolmen hang]$ tar -zxf ./z-mount-hang.tar.gz tar: This does not look like a tar archive tar: Skipping to next header tar: Error exit delayed from previous errors
(In reply to comment #1) > The attachment seems to be corrupt: > > [steve@dolmen hang]$ tar -zxf ./z-mount-hang.tar.gz > tar: This does not look like a tar archive > tar: Skipping to next header > tar: Error exit delayed from previous errors When I downloaded it, it was double-gzipped. I changed the mime-type to application/octet-stream and now it downloads as a gzipped tarball.
Looking at the hangalyzer output, I've spotted this: z2 : Z_Cluster1: G: s:SH n:1/2 f:lsDpr t:UN d:UN/3644766000 l:0 a:0 r:6 z2 : (locked, sticky, demote, demote in prog, rep ly pending) z2 : Z_Cluster1: H: s:SH f:W e:0 p:5725 [glock_workqueue] gfs2_do_trans_ begin+0xce/0x144 [gfs2] i.e. the glock_workqueue trying to get the transaction lock. This looks like a dup of #483541 to me. We already have a patch which should solve that problem, but its currently untested so far as I know. If you have no objections, I'll close this as a dup of #483541.
*** This bug has been marked as a duplicate of bug 483541 ***