Description of problem: With a gfs filesystem: # mount /dev/grant/grant0 /mnt/GRANT0/ # touch /mnt/GRANT0/dir/myfile # mount --bind /mnt/GRANT0/dir /mnt/dir # mount /dev/mapper/grant-grant0 on /mnt/GRANT0 type gfs (rw,hostdata=jid=0:id=65538:first=1) /mnt/GRANT0/dir on /mnt/dir type none (rw,bind) # ls /mnt/dir myfile # umount /mnt/GRANT0/ # ls /mnt/dir myfile # touch /mnt/dir/myfile # vi /mnt/dir/myfile [HANG] A umount attempt at this point hangs as well. Version-Release number of selected component (if applicable): 2.6.18-104.el5 gfs-utils-0.1.17-1.el5 gfs2-utils-0.1.44-1.el5 kmod-gfs-0.1.23-5.el5 How reproducible: Everytime
Hasn't this already been fixed? Afaik, it was a umount.gfs issue and that was resovled by moving the umount code into gfs_controld.
It's been in NEEDINFO for more than six months; closing.
I found that the test case for this was commented out from mount_stress so I enabled it and tried it again. The umount still hangs. Steps to Produce: mount -t gfs /dev/mapper/tankmorph-tankmorph9 /mnt/tankmorph9 mkdir -p /mnt/tankmorph9/binddir mkdir -p /mnt/binddir mount --bind /mnt/tankmorph9/binddir /mnt/binddir umount /mnt/tankmorph9 umount /mnt/binddir [HANG] Version-Release number of selected component (if applicable): kernel-2.6.18-164.el5 gfs-utils-0.1.20-1.el5 gfs2-utils-0.1.62-1.el5 kmod-gfs-0.1.34-2.el5 cman-2.0.115-1.el5_4.2
I was able to reproduce this with GFS2 also.
Here's the glock information for the stuck umount process in my GFS2 recreation: G: s:UN n:2/c699 f:l t:EX d:EX/0 l:0 a:0 r:4 H: s:EX f:cW e:0 p:7793 [umount] gfs2_statfs_sync+0x46/0x173 [gfs2] I could not find the corresponding DLM lock on any node in the cluster.
Here's the call trace for the umount process waiting for the statfs glock (corresponds to comment #5): umount D 00000208 2844 7793 7792 (NOTLB) f33dde14 00000086 c80479d6 00000208 f33dde28 c078f3c4 c31128e0 00000007 f57a8000 c804a93c 00000208 00002f66 00000001 f57a810c c3119724 f367aac0 c311a0c4 c311a0c4 00000000 c078f3cc 00000000 00000000 00000000 ffffffff Call Trace: [<f9007123>] just_schedule+0x5/0x8 [gfs2] [<c061642d>] __wait_on_bit+0x33/0x58 [<f900711e>] just_schedule+0x0/0x8 [gfs2] [<f900711e>] just_schedule+0x0/0x8 [gfs2] [<c06164b4>] out_of_line_wait_on_bit+0x62/0x6a [<c0434d40>] wake_bit_function+0x0/0x3c [<f9007117>] gfs2_glock_wait+0x27/0x2e [gfs2] [<f901c312>] gfs2_statfs_sync+0x4d/0x173 [gfs2] [<f901c30b>] gfs2_statfs_sync+0x46/0x173 [gfs2] [<f9015d10>] gfs2_make_fs_ro+0x20/0x86 [gfs2] [<c0615d27>] wait_for_completion+0x7f/0x8f [<c041e847>] default_wake_function+0x0/0xc [<f9015e92>] gfs2_put_super+0x61/0x160 [gfs2] [<c0478e95>] generic_shutdown_super+0x64/0xd5 [<c0478f23>] kill_block_super+0x1d/0x2d [<f901258e>] gfs2_kill_sb+0x54/0x64 [gfs2] [<c0478fcb>] deactivate_super+0x52/0x65 [<c048ccd6>] sys_umount+0x1f0/0x218 [<c044840a>] audit_syscall_entry+0x15a/0x18c [<c048cd09>] sys_oldumount+0xb/0xe [<c0404f17>] syscall_call+0x7/0xb
Reassigning to Ben as it may be related to statfs.
I tried reproducing this on upstream gfs2 a few days ago and wasn't able to. So it might be something we've not backported yet.
Actually, this has nothing to do with stafs. That just seems to be what happened to get stuck. If you do a bind mount, and then remove the original mount, it appears that the next time you actually need to talk to the DLM, you get stuck.
Hmm, do we still have umount.gfs I wonder? It sounds like the age old issue that we had before using uevents for umount. If not then its most likely a ref count issue on the gfs super block.
Yes, umount.gfs2 and umount.gfs still exist in RHEL5.
This is fixed in RHEL6. It will not be backported to RHEL5.