459647 – gfs issues with over lapping bind mounts

Bug 459647 - gfs issues with over lapping bind mounts

Summary: gfs issues with over lapping bind mounts

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs-utils
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	5.5
Assignee:	Ben Marzinski
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	318281
Blocks:
TreeView+	depends on / blocked

Reported:	2008-08-20 20:05 UTC by Corey Marthaler
Modified:	2010-01-12 03:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-11-04 16:11:30 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Corey Marthaler 2008-08-20 20:05:32 UTC

Description of problem:

With a gfs filesystem:
# mount /dev/grant/grant0 /mnt/GRANT0/
# touch /mnt/GRANT0/dir/myfile
# mount --bind /mnt/GRANT0/dir /mnt/dir

# mount
/dev/mapper/grant-grant0 on /mnt/GRANT0 type gfs (rw,hostdata=jid=0:id=65538:first=1)
/mnt/GRANT0/dir on /mnt/dir type none (rw,bind)


# ls /mnt/dir
myfile
# umount /mnt/GRANT0/
# ls /mnt/dir
myfile
# touch /mnt/dir/myfile 
# vi /mnt/dir/myfile
[HANG]

A umount attempt at this point hangs as well.

Version-Release number of selected component (if applicable):
2.6.18-104.el5
gfs-utils-0.1.17-1.el5
gfs2-utils-0.1.44-1.el5
kmod-gfs-0.1.23-5.el5

How reproducible:
Everytime

Comment 1 Steve Whitehouse 2008-12-10 17:13:12 UTC

Hasn't this already been fixed? Afaik, it was a umount.gfs issue and that was resovled by moving the umount code into gfs_controld.

Comment 2 Robert Peterson 2009-06-23 13:31:22 UTC

It's been in NEEDINFO for more than six months; closing.

Comment 3 Nate Straz 2009-09-18 21:35:26 UTC

I found that the test case for this was commented out from mount_stress so I enabled it and tried it again.  The umount still hangs.

Steps to Produce:
mount -t gfs /dev/mapper/tankmorph-tankmorph9 /mnt/tankmorph9
mkdir -p /mnt/tankmorph9/binddir
mkdir -p /mnt/binddir
mount --bind /mnt/tankmorph9/binddir /mnt/binddir
umount /mnt/tankmorph9
umount /mnt/binddir
[HANG]


Version-Release number of selected component (if applicable):
kernel-2.6.18-164.el5
gfs-utils-0.1.20-1.el5
gfs2-utils-0.1.62-1.el5
kmod-gfs-0.1.34-2.el5
cman-2.0.115-1.el5_4.2

Comment 4 Nate Straz 2009-09-21 15:37:23 UTC

I was able to reproduce this with GFS2 also.

Comment 5 Nate Straz 2009-09-21 19:54:16 UTC

Here's the glock information for the stuck umount process in my GFS2 recreation:

G:  s:UN n:2/c699 f:l t:EX d:EX/0 l:0 a:0 r:4
 H: s:EX f:cW e:0 p:7793 [umount] gfs2_statfs_sync+0x46/0x173 [gfs2]

I could not find the corresponding DLM lock on any node in the cluster.

Comment 6 Robert Peterson 2009-09-21 21:05:46 UTC

Here's the call trace for the umount process waiting for the
statfs glock (corresponds to comment #5):

umount        D 00000208  2844  7793   7792                     (NOTLB)
       f33dde14 00000086 c80479d6 00000208 f33dde28 c078f3c4 c31128e0 00000007 
       f57a8000 c804a93c 00000208 00002f66 00000001 f57a810c c3119724 f367aac0 
       c311a0c4 c311a0c4 00000000 c078f3cc 00000000 00000000 00000000 ffffffff 
Call Trace:
 [<f9007123>] just_schedule+0x5/0x8 [gfs2]
 [<c061642d>] __wait_on_bit+0x33/0x58
 [<f900711e>] just_schedule+0x0/0x8 [gfs2]
 [<f900711e>] just_schedule+0x0/0x8 [gfs2]
 [<c06164b4>] out_of_line_wait_on_bit+0x62/0x6a
 [<c0434d40>] wake_bit_function+0x0/0x3c
 [<f9007117>] gfs2_glock_wait+0x27/0x2e [gfs2]
 [<f901c312>] gfs2_statfs_sync+0x4d/0x173 [gfs2]
 [<f901c30b>] gfs2_statfs_sync+0x46/0x173 [gfs2]
 [<f9015d10>] gfs2_make_fs_ro+0x20/0x86 [gfs2]
 [<c0615d27>] wait_for_completion+0x7f/0x8f
 [<c041e847>] default_wake_function+0x0/0xc
 [<f9015e92>] gfs2_put_super+0x61/0x160 [gfs2]
 [<c0478e95>] generic_shutdown_super+0x64/0xd5
 [<c0478f23>] kill_block_super+0x1d/0x2d
 [<f901258e>] gfs2_kill_sb+0x54/0x64 [gfs2]
 [<c0478fcb>] deactivate_super+0x52/0x65
 [<c048ccd6>] sys_umount+0x1f0/0x218
 [<c044840a>] audit_syscall_entry+0x15a/0x18c
 [<c048cd09>] sys_oldumount+0xb/0xe
 [<c0404f17>] syscall_call+0x7/0xb

Comment 7 Nate Straz 2009-09-23 15:19:28 UTC

Reassigning to Ben as it may be related to statfs.

Comment 8 Steve Whitehouse 2009-10-02 13:05:26 UTC

I tried reproducing this on upstream gfs2 a few days ago and wasn't able to. So it might be something we've not backported yet.

Comment 9 Ben Marzinski 2009-10-27 04:37:53 UTC

Actually, this has nothing to do with stafs.  That just seems to be what happened to get stuck.  If you do a bind mount, and then remove the original mount, it appears that the next time you actually need to talk to the DLM, you get stuck.

Comment 10 Steve Whitehouse 2009-10-27 09:00:25 UTC

Hmm, do we still have umount.gfs I wonder? It sounds like the age old issue that we had before using uevents for umount.

If not then its most likely a ref count issue on the gfs super block.

Comment 11 Ben Marzinski 2009-10-27 23:57:17 UTC

Yes, umount.gfs2 and umount.gfs still exist in RHEL5.

Comment 14 Ben Marzinski 2009-11-04 16:11:30 UTC

This is fixed in RHEL6.  It will not be backported to RHEL5.

Note You need to log in before you can comment on or make changes to this bug.