Bug 485336

Summary: GFS2 mount hung after recovery
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: kernelAssignee: Steve Whitehouse <swhiteho>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: cluster-maint, edamato
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-17 14:57:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
collected debug information for mount hang none

Description Nate Straz 2009-02-12 22:35:28 UTC
Created attachment 331778 [details]
collected debug information for mount hang

Description of problem:

While running revolver during 5.3.z testing of GFS2, I hit a state where the mount on one node hung.  Running gfs2_hangalyzer found for locks with waiters but no holders.

Attached is the output of gfs2_hangalyzer, SysRq-T, dlm lock dump and glock dump.

Version-Release number of selected component (if applicable):
kernel-2.6.18-128.1.1.el5
gfs2-utils-0.1.53-1.el5_3.1


How reproducible:
Unknown

Steps to Reproduce:
1. run revolver and pray
  
Actual results:


Expected results:


Additional info:

Comment 1 Steve Whitehouse 2009-02-13 10:12:23 UTC
The attachment seems to be corrupt:

[steve@dolmen hang]$ tar -zxf ./z-mount-hang.tar.gz 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Error exit delayed from previous errors

Comment 2 Nate Straz 2009-02-13 14:34:16 UTC
(In reply to comment #1)
> The attachment seems to be corrupt:
> 
> [steve@dolmen hang]$ tar -zxf ./z-mount-hang.tar.gz 
> tar: This does not look like a tar archive
> tar: Skipping to next header
> tar: Error exit delayed from previous errors

When I downloaded it, it was double-gzipped.  I changed the mime-type to application/octet-stream and now it downloads as a gzipped tarball.

Comment 3 Steve Whitehouse 2009-02-17 13:52:23 UTC
Looking at the hangalyzer output, I've spotted this:

z2        : Z_Cluster1: G:  s:SH n:1/2 f:lsDpr t:UN d:UN/3644766000 l:0 a:0 r:6
z2        :                         (locked, sticky, demote, demote in prog, rep
ly pending)
z2        : Z_Cluster1:  H: s:SH f:W e:0 p:5725 [glock_workqueue] gfs2_do_trans_
begin+0xce/0x144 [gfs2]


i.e. the glock_workqueue trying to get the transaction lock. This looks like a dup of #483541 to me. We already have a patch which should solve that problem, but its currently untested so far as I know.

If you have no objections, I'll close this as a dup of #483541.

Comment 4 Steve Whitehouse 2009-02-17 14:57:51 UTC

*** This bug has been marked as a duplicate of bug 483541 ***