Bug 475312

Summary: GFS2: mount attempt hangs if no more journals available
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: kernelAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 5.3CC: dzickus, edamato, sghosh, swhiteho, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 09:01:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 425421    
Bug Blocks:    
Attachments:
Description Flags
Small patch
none
RHEL5 version of the upstream patch none

Description Nate Straz 2008-12-08 20:51:32 UTC
I tried mounting a GFS2 file system on more nodes than there were journals and it hung.

+++ This bug was initially created as a clone of Bug #425421 +++

Description of problem:
I created a gfs filesystem on a cluster with three nodes.

[root@grant-02 ~]# gfs_mkfs -O -j 2 -p lock_dlm -t GRANT-CLUSTER:gfs2
/dev/grant/gfs2
Device:                    /dev/grant/gfs2
Blocksize:                 4096
Filesystem Size:           19593744
Journals:                  2
Resource Groups:           300
Locking Protocol:          lock_dlm
Lock Table:                GRANT-CLUSTER:gfs2

Syncing...
All Done

When I attempt mount on the third node, it hangs.

Trying to join cluster "lock_dlm", "GRANT-CLUSTER:gfs2"
Joined cluster. Now mounting FS...
GFS: fsid=GRANT-CLUSTER:gfs2.2: can't mount journal #2
GFS: fsid=GRANT-CLUSTER:gfs2.2: there are only 2 journals (0 - 1)
Dec 14 14:34:57 grant-03 kernel: GFS 0.1.19-7.el5_1.1 (built Nov 12 2007 19:27:d
Dec 14 14:34:57 grant-03 kernel: Trying to join cluster "lock_dlm", "GRANT-CLUS"
Dec 14 14:34:57 grant-03 kernel: Joined cluster. Now mounting FS...
Dec 14 14:34:57 grant-03 kernel: GFS: fsid=GRANT-CLUSTER:gfs2.2: can't mount jo2
Dec 14 14:34:57 grant-03 kernel: GFS: fsid=GRANT-CLUSTER:gfs2.2: there are only)


[root@grant-03 ~]# strace mount /dev/grant/gfs2 /mnt/gfs2
execve("/bin/mount", ["mount", "/dev/grant/gfs2", "/mnt/gfs2"], [/* 21 vars */]) = 0
brk(0)                                  = 0xee55000

[...]

stat("/dev/grant/gfs2", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 3), ...}) = 0
rt_sigprocmask(SIG_BLOCK, ~[TRAP SEGV RTMIN RT_1], NULL, 8) = 0
open("/dev/grant/gfs2", O_RDONLY)       = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 3), ...}) = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
69632) = 69632
close(3)                                = 0
stat("/sbin/mount.gfs", {st_mode=S_IFREG|0755, st_size=42000, ...}) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaaaba540) = 3262
wait4(-1,

Version-Release number of selected component (if applicable):
[root@grant-03 ~]# rpm -qa | grep gfs
gfs2-utils-0.1.38-1.el5
gfs-utils-0.1.12-1.el5
kmod-gfs-0.1.19-7.el5
kmod-gfs-0.1.19-7.el5_1.1

How reproducible:
everytime

Comment 1 Steve Whitehouse 2008-12-09 10:05:06 UTC
Its worth trying this with the "remove lock_dlm" patch applied since that makes changes to that particular area, and might just fix it.

Something I did notice while testing that was that if the kernel refuses to mount and returns an error code, the usual result was a seg fault from mount.gfs2. So there is something that needs looking at in this area still.

Comment 2 David Teigland 2009-01-19 16:17:19 UTC
First, this is gfs1, not gfs2.
Second, there's no way lock_dlm will be removed in RHEL5.

Comment 3 Steve Whitehouse 2009-01-19 16:39:07 UTC
Yes, but thats not what I was pointing out... the issue was that we seem to have some kind of problem when mounts fail (e.g. arguments don't parse correctly) which results in mount.gfs2 getting stuck. I'd have though that gfs1 probably was using the same or very similar code in that area.

Comment 4 Nate Straz 2009-01-19 16:47:00 UTC
(In reply to comment #2)
> First, this is gfs1, not gfs2.

It was fixed in 5.3 for gfs1.  I was running the same test on gfs2 and found it failed.  This affects the RHEL 5.3 release for GFS2.

Comment 5 Nate Straz 2009-01-19 17:10:11 UTC
Log messages on morph-04, which tried to mount a GFS2 file system which didn't have a journal free:

GFS2: fsid=: Trying to join cluster "lock_dlm", "morph-cluster:morph-cluster0"
GFS2: fsid=morph-cluster:morph-cluster0.2: Joined cluster. Now mounting FS...
GFS2: fsid=morph-cluster:morph-cluster0.2: can't mount journal #2
GFS2: fsid=morph-cluster:morph-cluster0.2: there are only 2 journals (0 - 1)


group_tool shows that it did join the dlm lockspaces.

[root@morph-04 ~]# group_tool
type             level name            id       state
fence            0     default         00010001 none
[1 3 4]
dlm              1     clvmd           00020001 none
[1 3 4]
dlm              1     morph-cluster0  00060001 none
[1 3 4]
gfs              2     morph-cluster0  00050001 none
[1 3 4]

strace output shows that it is hung in the mount system call.

3439  connect(3, {sa_family=AF_FILE, path=@"gfs_controld_sock"...}, 20) = 0
3439  write(3, "join /mnt/morph-cluster0 gfs2 lo"..., 256) = 256
3439  read(3, "0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0".
.., 256) = 256
3439  read(3, "hostdata=jid=2:id=262145:first=0"..., 256) = 256
3439  mount("/dev/mapper/morph--cluster-morph--cluster0", "/mnt/morph-cluster0",
 "gfs2", 0, "hostdata=jid=2:id=262145:first=0"

Comment 6 Nate Straz 2009-01-19 17:32:17 UTC
Backtrace of the mount.gfs2 process

crash> bt 3727
PID: 3727   TASK: f3203550  CPU: 1   COMMAND: "mount.gfs2"
 #0 [f3213c64] schedule at c060e785
 #1 [f3213cd0] schedule_timeout at c060eedf
 #2 [f3213cf4] msleep at c042c2ad
 #3 [f3213cf8] gfs2_gl_hash_clear at f8eb9f07
 #4 [f3213d10] fill_super at f8ec621d
 #5 [f3213da0] get_sb_bdev at c04787ee
 #6 [f3213dd4] gfs2_get_sb at f8ec494d
 #7 [f3213de4] vfs_kern_mount at c04782b4
 #8 [f3213e0c] do_kern_mount at c0478359
 #9 [f3213e24] do_mount at c048b374
#10 [f3213f98] sys_mount at c048b451
#11 [f3213fb8] system_call at c0404f10
    EAX: ffffffda  EBX: bfb89198  ECX: bfb8a199  EDX: 0804ed05
    DS:  007b      ESI: 00000000  ES:  007b      EDI: bfb8e19d
    SS:  007b      ESP: bfb8915c  EBP: bfb90698
    CS:  0073      EIP: 00f66402  ERR: 00000015  EFLAGS: 00000246

Comment 7 Robert Peterson 2009-01-19 17:46:53 UTC
In the case of GFS, the problem was that the error path when mounting
failed to release resources associated with the license file inode,
which had been retooled as the fast statfs file.  The fix should
therefore not need to be crosswritten to gfs2. Here is a link to the
fix:

http://git.fedoraproject.org/git/?p=cluster.git;a=blobdiff;f=gfs-kernel/src/gfs/ops_fstype.c;h=e01ea32a8bd670f98463a8ddc8f1ce1f04904e49;hp=10b08385275ef17130a5032fcef3db5c7cad9315;hb=b5cc95a48417758429752998be12c059b7ac2b95;hpb=1d56fb441d78faf375eb26a84e775cc8dde7e705

However, it might be a clue as to what's going wrong.  I'll check
gfs2's error path during mounting to see if there's a similar inode
not being released.

Comment 8 Nate Straz 2009-01-19 18:25:51 UTC
I left the system in the hung state for an hour and I started seeing these messages on the console:

GFS2: fsid=morph-cluster:morph-cluster0.2: Unmount seems to be stalled. Dumping lock state...
 G:  s:SH n:5/16 f: t:SH d:EX/0 l:0 a:0 r:2
  H: s:SH f:EH e:0 p:3727 [mount.gfs2] gfs2_inode_lookup+0x12d/0x1f0 [gfs2]
 G:  s:UN n:2/16 f: t:UN d:EX/0 l:0 a:0 r:2

Comment 9 Robert Peterson 2009-01-19 19:42:54 UTC
So the iopen glock is still held for the root inode (5/16).
Hopefully easy to find and fix.

Comment 10 Robert Peterson 2009-01-20 04:32:52 UTC
I've recreated the problem and am confident I can fix this easily.
Changing status to assigned and requesting ack flags for 5.4.

Comment 11 Steve Whitehouse 2009-01-20 13:20:28 UTC
I suspect that we need to add
dput(sb->s_root);
just before
sb->s_root = NULL;
in fill_super() since the dcache seems to be holding a ref to the root inode at that point.

Comment 12 Steve Whitehouse 2009-01-20 15:53:10 UTC
Created attachment 329478 [details]
Small patch

This appears to do the trick.

Comment 13 Steve Whitehouse 2009-01-20 17:44:27 UTC
Patch is now upstream

Comment 14 Robert Peterson 2009-01-20 18:01:42 UTC
Created attachment 329495 [details]
RHEL5 version of the upstream patch

This is the RHEL5 version of the upstream patch.  The patch is identical
except for the diff offsets.  I have tested this patch and verified that
it fixes the failing scenario on system roth-01.  I'll post this one to
rhkernel-list for inclusion into the 5.4 kernel.

Comment 15 Robert Peterson 2009-01-20 18:14:16 UTC
The patch was posted to rhkernel-list, so I'm changing the status to POST
and adding Don Zickus to the cc list.

Comment 16 Don Zickus 2009-01-27 16:02:31 UTC
in kernel-2.6.18-129.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 17 RHEL Program Management 2009-02-16 15:45:16 UTC
Updating PM score.

Comment 19 Nate Straz 2009-04-24 15:55:11 UTC
Verified against kernel-2.6.18-140.gfs2abhi.004.

Comment 21 errata-xmlrpc 2009-09-02 09:01:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html