Bug 485336 - GFS2 mount hung after recovery
Summary: GFS2 mount hung after recovery
Keywords:
Status: CLOSED DUPLICATE of bug 483541
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Steve Whitehouse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-02-12 22:35 UTC by Nate Straz
Modified: 2009-05-28 03:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-17 14:57:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
collected debug information for mount hang (66.25 KB, application/octet-stream)
2009-02-12 22:35 UTC, Nate Straz
no flags Details

Description Nate Straz 2009-02-12 22:35:28 UTC
Created attachment 331778 [details]
collected debug information for mount hang

Description of problem:

While running revolver during 5.3.z testing of GFS2, I hit a state where the mount on one node hung.  Running gfs2_hangalyzer found for locks with waiters but no holders.

Attached is the output of gfs2_hangalyzer, SysRq-T, dlm lock dump and glock dump.

Version-Release number of selected component (if applicable):
kernel-2.6.18-128.1.1.el5
gfs2-utils-0.1.53-1.el5_3.1


How reproducible:
Unknown

Steps to Reproduce:
1. run revolver and pray
  
Actual results:


Expected results:


Additional info:

Comment 1 Steve Whitehouse 2009-02-13 10:12:23 UTC
The attachment seems to be corrupt:

[steve@dolmen hang]$ tar -zxf ./z-mount-hang.tar.gz 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Error exit delayed from previous errors

Comment 2 Nate Straz 2009-02-13 14:34:16 UTC
(In reply to comment #1)
> The attachment seems to be corrupt:
> 
> [steve@dolmen hang]$ tar -zxf ./z-mount-hang.tar.gz 
> tar: This does not look like a tar archive
> tar: Skipping to next header
> tar: Error exit delayed from previous errors

When I downloaded it, it was double-gzipped.  I changed the mime-type to application/octet-stream and now it downloads as a gzipped tarball.

Comment 3 Steve Whitehouse 2009-02-17 13:52:23 UTC
Looking at the hangalyzer output, I've spotted this:

z2        : Z_Cluster1: G:  s:SH n:1/2 f:lsDpr t:UN d:UN/3644766000 l:0 a:0 r:6
z2        :                         (locked, sticky, demote, demote in prog, rep
ly pending)
z2        : Z_Cluster1:  H: s:SH f:W e:0 p:5725 [glock_workqueue] gfs2_do_trans_
begin+0xce/0x144 [gfs2]


i.e. the glock_workqueue trying to get the transaction lock. This looks like a dup of #483541 to me. We already have a patch which should solve that problem, but its currently untested so far as I know.

If you have no objections, I'll close this as a dup of #483541.

Comment 4 Steve Whitehouse 2009-02-17 14:57:51 UTC

*** This bug has been marked as a duplicate of bug 483541 ***


Note You need to log in before you can comment on or make changes to this bug.