Description of problem: We're using nfs4/krb5 mounts via autofs (although I get the same result without autofs and mounting the directory directly): earth:/export/home/orion on /home/orion type nfs4 (rw,noatime,vers=4,rsize=32768,wsize=32768,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.10.30.4,minorversion=0,local_lock=none,addr=10.10.10.1) If the network is disconnected it is impossible to unmount, even if no processes are accessing the mount. umount -f and umount -l both hang on readlink("/home/orion"). At this point it is impossible to shutdown cleanly and you must hold the power button down until the machine powers off. Sometimes get: May 10 12:00:10 makani kernel: [ 2018.272071] nfs: server earth not responding, still trying but that's it. Version-Release number of selected component (if applicable): nfs-utils-1.2.5-5.fc16.i686 3.3.4-3.fc16.i686 How reproducible: Every time Steps to Reproduce: 1. mount nfs4/krb5 mount 2. pull network 3. umount -l <mount> Actual results: umount -l hangs Expected results: umount succeeds, perhaps with delay.
Can reproduce on F17 as well with: nfs-utils-1.2.5-15.fc17.x86_64 3.3.4-5.fc17.x86_64 I can unmount non-krb5 nfs4/nfs3 mounts just fine. Just not nfs4 sec=krb5 ones.
The Debian reporter notes and I can confirm that a program like: #include <sys/mount.h> int main(int argc, char **argv) { umount2(argv[1], MNT_FORCE); return 0; } works to unmount the directory. So can we fix readlink() or skip it?
(In reply to comment #2) > The Debian reporter notes and I can confirm that a program like: > > #include <sys/mount.h> > int main(int argc, char **argv) { > umount2(argv[1], MNT_FORCE); > return 0; > } > > works to unmount the directory. So can we fix readlink() or skip it? So if this works... mount -f should work... at least in theory....
In practice umount needs to canonicalize the path: Breakpoint 1, readlink () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 readlink () at ../sysdeps/unix/syscall-template.S:82 #1 0x0065929f in readlink (__len=4096, __buf=0xbf98b0e9 ",indirect", __path=0xbf98c0ea "/home/orion") at /usr/include/bits/unistd.h:151 #2 myrealpath (resolved_path=0xbf98c0ea "/home/orion", path=0x2191ec5b "", maxreslth=<optimized out>) at ../../lib/canonicalize.c:96 #3 canonicalize_path (path=0x2191ec50 "/home/orion") at ../../lib/canonicalize.c:177 #4 0x006437e7 in mnt_resolve_path (path=path@entry=0x2191ec50 "/home/orion", cache=0x21920ec0) at cache.c:472 #5 0x00651084 in mnt_context_prepare_target (cxt=cxt@entry=0x2191e878) at context.c:1249 #6 0x006559f8 in mnt_context_prepare_umount (cxt=cxt@entry=0x2191e878) at context_umount.c:606 #7 0x006572a1 in mnt_context_umount (cxt=cxt@entry=0x2191e878) at context_umount.c:753 #8 0x0094a85c in umount_one (spec=<optimized out>, cxt=0x2191e878) at umount.c:273 #9 main (argc=<optimized out>, argv=0xbf98d400) at umount.c:392 It hangs after this.
Re-assigning to util-linux since that's where this is happening. But then we probably need to address the hang in the kernel?
This is going to make it impossible to run krb5 nfs4 on laptops where the network can go away at any moment.
[94630.673017] umount.nfs D 0000009c 0 14999 14882 0x00000080 [94630.673017] c30f5c38 00000086 00000001 0000009c ed110004 1b928142 0000560e 00000000 [94630.673017] c0c4b180 ed37c000 c0c4b180 f5007180 f6b37110 c32ef110 c30f5c28 f7fd6243 [94630.673017] c2f9c580 c30f5c20 f7fd9ff2 f82520c0 00000246 c30f5c0c c0927c33 c30f5c30 [94630.673017] Call Trace: [94630.673017] [<f7fd6243>] ? xs_sendpages+0x63/0x1f0 [sunrpc] [94630.673017] [<f7fd9ff2>] ? __rpc_sleep_on_priority+0x122/0x210 [sunrpc] [94630.673017] [<c0927c33>] ? _raw_spin_unlock_bh+0x13/0x20 [94630.673017] [<c0927c33>] ? _raw_spin_unlock_bh+0x13/0x20 [94630.673017] [<c0926ed5>] schedule+0x35/0x50 [94630.673017] [<f7fd96fd>] rpc_wait_bit_killable+0x2d/0x70 [sunrpc] [94630.673017] [<c09259a1>] __wait_on_bit+0x51/0x70 [94630.673017] [<f7fd96d0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] [94630.673017] [<f7fd96d0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] [94630.673017] [<c0925a21>] out_of_line_wait_on_bit+0x61/0x70 [94630.673017] [<c0455480>] ? autoremove_wake_function+0x50/0x50 [94630.673017] [<f7fda2e7>] __rpc_execute+0x187/0x2a0 [sunrpc] [94630.673017] [<c0455423>] ? wake_up_bit+0x23/0x30 [94630.673017] [<f7fda548>] rpc_execute+0x38/0x40 [sunrpc] [94630.673017] [<f7fd30a9>] rpc_run_task+0x59/0x70 [sunrpc] [94630.673017] [<f7fd31bc>] rpc_call_sync+0x3c/0x60 [sunrpc] [94630.673017] [<f84aff63>] _nfs4_call_sync+0x23/0x30 [nfs] [94630.673017] [<f84afc3e>] _nfs4_proc_getattr+0x8e/0xa0 [nfs] [94630.673017] [<f84b385b>] nfs4_proc_getattr+0x3b/0x60 [nfs] [94630.673017] [<f849d311>] __nfs_revalidate_inode+0x81/0x210 [nfs] [94630.673017] [<f849d5df>] nfs_revalidate_inode+0x2f/0x50 [nfs] [94630.673017] [<f8496b3f>] nfs_check_verifier+0x4f/0x80 [nfs] [94630.673017] [<f8498ca2>] nfs_lookup_revalidate+0x232/0x450 [nfs] [94630.673017] [<c05ead5e>] ? autofs4_d_manage+0x8e/0xf0 [94630.673017] [<f8499811>] nfs_open_revalidate+0x41/0x220 [nfs] [94630.673017] [<c053e79b>] ? follow_managed+0x19b/0x1f0 [94630.673017] [<c053ff00>] ? unlazy_walk+0xd0/0x180 [94630.673017] [<c0540153>] ? do_lookup+0x1a3/0x350 [94630.673017] [<c053f748>] complete_walk+0x88/0xc0 [94630.673017] [<c0540cc3>] path_lookupat+0x63/0x620 [94630.673017] [<c0523b89>] ? kmem_cache_alloc+0x29/0x120 [94630.673017] [<c065a998>] ? strncpy_from_user+0x38/0x70 [94630.673017] [<c05412aa>] do_path_lookup+0x2a/0xb0 [94630.673017] [<c0542466>] user_path_at_empty+0x46/0x80 [94630.673017] [<c092b557>] ? do_page_fault+0x1b7/0x450 [94630.673017] [<c050c074>] ? remove_vma+0x44/0x60 [94630.673017] [<c054e233>] ? mntput_no_expire+0x23/0x100 [94630.673017] [<c0539313>] sys_readlinkat+0x43/0xb0 [94630.673017] [<c05393ac>] sys_readlink+0x2c/0x30 [94630.673017] [<c0927ed4>] syscall_call+0x7/0xb
You can try to use "umount --no-canonicalize", but it's workaround. It seems that we don't have to call mnt_context_prepare_target() (canonicalize the mountpoint) if the mountpoint is already found in mtab (or /proc/self/mountinfo) file. I'll try to optimize the code. Thanks.
Fixed by upstream commit fa705b5441bdb93c36702f7db6c54ec1133bd1cc, the Fefora package(s) will be updated ASAP. The target path canonicalization is necessary only if you umount /foo/bar-symlink otherwise we can rely on the fact that all mountpoint paths in /proc/self/mountinfo are already canonicalized by kernel.
Excellent! I built a local version with that patch and can unmount now fine. Thanks! It would be great to see this is in F16 soon if possible.
util-linux-2.21.2-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/util-linux-2.21.2-1.fc17
Package util-linux-2.21.2-1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing util-linux-2.21.2-1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-8440/util-linux-2.21.2-1.fc17 then log in and leave karma (feedback).
util-linux-2.21.2-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.