Bug 820707

Summary: Impossible to unmount nfsv4/krb5 mounts after network disconnect
Product: [Fedora] Fedora Reporter: Orion Poplawski <orion>
Component: util-linuxAssignee: Karel Zak <kzak>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: bfields, jlayton, jonathan, kzak, mluscon, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://thread.gmane.org/gmane.linux.nfs/49551
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-31 00:54:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Orion Poplawski 2012-05-10 18:03:35 UTC
Description of problem:

We're using nfs4/krb5 mounts via autofs (although I get the same result without autofs and mounting the directory directly):

earth:/export/home/orion on /home/orion type nfs4 (rw,noatime,vers=4,rsize=32768,wsize=32768,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.10.30.4,minorversion=0,local_lock=none,addr=10.10.10.1)

If the network is disconnected it is impossible to unmount, even if no processes are accessing the mount.  umount -f and umount -l both hang on readlink("/home/orion").  At this point it is impossible to shutdown cleanly and you must hold the power button down until the machine powers off.

Sometimes get:

May 10 12:00:10 makani kernel: [ 2018.272071] nfs: server earth not responding, still trying

but that's it.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.5-5.fc16.i686
3.3.4-3.fc16.i686

How reproducible:
Every time

Steps to Reproduce:
1. mount nfs4/krb5 mount
2. pull network
3. umount -l <mount>
  
Actual results:
umount -l hangs

Expected results:
umount succeeds, perhaps with delay.

Comment 1 Orion Poplawski 2012-05-10 20:09:28 UTC
Can reproduce on F17 as well with:

nfs-utils-1.2.5-15.fc17.x86_64
3.3.4-5.fc17.x86_64

I can unmount non-krb5 nfs4/nfs3 mounts just fine.  Just not nfs4 sec=krb5 ones.

Comment 2 Orion Poplawski 2012-05-15 20:49:53 UTC
The Debian reporter notes and I can confirm that a program like:

#include <sys/mount.h>
int main(int argc, char **argv) {
    umount2(argv[1], MNT_FORCE);
    return 0;
}

works to unmount the directory.  So can we fix readlink() or skip it?

Comment 3 Steve Dickson 2012-05-15 21:54:29 UTC
(In reply to comment #2)
> The Debian reporter notes and I can confirm that a program like:
> 
> #include <sys/mount.h>
> int main(int argc, char **argv) {
>     umount2(argv[1], MNT_FORCE);
>     return 0;
> }
> 
> works to unmount the directory.  So can we fix readlink() or skip it?
So if this works... mount -f should work... at least in theory....

Comment 4 Orion Poplawski 2012-05-15 22:20:28 UTC
In practice umount needs to canonicalize the path:

Breakpoint 1, readlink () at ../sysdeps/unix/syscall-template.S:82
82      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  readlink () at ../sysdeps/unix/syscall-template.S:82
#1  0x0065929f in readlink (__len=4096, __buf=0xbf98b0e9 ",indirect", __path=0xbf98c0ea "/home/orion")
    at /usr/include/bits/unistd.h:151
#2  myrealpath (resolved_path=0xbf98c0ea "/home/orion", path=0x2191ec5b "", maxreslth=<optimized out>)
    at ../../lib/canonicalize.c:96
#3  canonicalize_path (path=0x2191ec50 "/home/orion") at ../../lib/canonicalize.c:177
#4  0x006437e7 in mnt_resolve_path (path=path@entry=0x2191ec50 "/home/orion", cache=0x21920ec0) at cache.c:472
#5  0x00651084 in mnt_context_prepare_target (cxt=cxt@entry=0x2191e878) at context.c:1249
#6  0x006559f8 in mnt_context_prepare_umount (cxt=cxt@entry=0x2191e878) at context_umount.c:606
#7  0x006572a1 in mnt_context_umount (cxt=cxt@entry=0x2191e878) at context_umount.c:753
#8  0x0094a85c in umount_one (spec=<optimized out>, cxt=0x2191e878) at umount.c:273
#9  main (argc=<optimized out>, argv=0xbf98d400) at umount.c:392

It hangs after this.

Comment 5 Orion Poplawski 2012-05-16 20:51:55 UTC
Re-assigning to util-linux since that's where this is happening.  But then we probably need to address the hang in the kernel?

Comment 6 Orion Poplawski 2012-05-16 21:12:57 UTC
This is going to make it impossible to run krb5 nfs4 on laptops where the network can go away at any moment.

Comment 7 Orion Poplawski 2012-05-16 21:29:46 UTC
[94630.673017] umount.nfs      D 0000009c     0 14999  14882 0x00000080
[94630.673017]  c30f5c38 00000086 00000001 0000009c ed110004 1b928142 0000560e 00000000
[94630.673017]  c0c4b180 ed37c000 c0c4b180 f5007180 f6b37110 c32ef110 c30f5c28 f7fd6243
[94630.673017]  c2f9c580 c30f5c20 f7fd9ff2 f82520c0 00000246 c30f5c0c c0927c33 c30f5c30
[94630.673017] Call Trace:
[94630.673017]  [<f7fd6243>] ? xs_sendpages+0x63/0x1f0 [sunrpc]
[94630.673017]  [<f7fd9ff2>] ? __rpc_sleep_on_priority+0x122/0x210 [sunrpc]
[94630.673017]  [<c0927c33>] ? _raw_spin_unlock_bh+0x13/0x20
[94630.673017]  [<c0927c33>] ? _raw_spin_unlock_bh+0x13/0x20
[94630.673017]  [<c0926ed5>] schedule+0x35/0x50
[94630.673017]  [<f7fd96fd>] rpc_wait_bit_killable+0x2d/0x70 [sunrpc]
[94630.673017]  [<c09259a1>] __wait_on_bit+0x51/0x70
[94630.673017]  [<f7fd96d0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[94630.673017]  [<f7fd96d0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[94630.673017]  [<c0925a21>] out_of_line_wait_on_bit+0x61/0x70
[94630.673017]  [<c0455480>] ? autoremove_wake_function+0x50/0x50
[94630.673017]  [<f7fda2e7>] __rpc_execute+0x187/0x2a0 [sunrpc]
[94630.673017]  [<c0455423>] ? wake_up_bit+0x23/0x30
[94630.673017]  [<f7fda548>] rpc_execute+0x38/0x40 [sunrpc]
[94630.673017]  [<f7fd30a9>] rpc_run_task+0x59/0x70 [sunrpc]
[94630.673017]  [<f7fd31bc>] rpc_call_sync+0x3c/0x60 [sunrpc]
[94630.673017]  [<f84aff63>] _nfs4_call_sync+0x23/0x30 [nfs]
[94630.673017]  [<f84afc3e>] _nfs4_proc_getattr+0x8e/0xa0 [nfs]
[94630.673017]  [<f84b385b>] nfs4_proc_getattr+0x3b/0x60 [nfs]
[94630.673017]  [<f849d311>] __nfs_revalidate_inode+0x81/0x210 [nfs]
[94630.673017]  [<f849d5df>] nfs_revalidate_inode+0x2f/0x50 [nfs]
[94630.673017]  [<f8496b3f>] nfs_check_verifier+0x4f/0x80 [nfs]
[94630.673017]  [<f8498ca2>] nfs_lookup_revalidate+0x232/0x450 [nfs]
[94630.673017]  [<c05ead5e>] ? autofs4_d_manage+0x8e/0xf0
[94630.673017]  [<f8499811>] nfs_open_revalidate+0x41/0x220 [nfs]
[94630.673017]  [<c053e79b>] ? follow_managed+0x19b/0x1f0
[94630.673017]  [<c053ff00>] ? unlazy_walk+0xd0/0x180
[94630.673017]  [<c0540153>] ? do_lookup+0x1a3/0x350
[94630.673017]  [<c053f748>] complete_walk+0x88/0xc0
[94630.673017]  [<c0540cc3>] path_lookupat+0x63/0x620
[94630.673017]  [<c0523b89>] ? kmem_cache_alloc+0x29/0x120
[94630.673017]  [<c065a998>] ? strncpy_from_user+0x38/0x70
[94630.673017]  [<c05412aa>] do_path_lookup+0x2a/0xb0
[94630.673017]  [<c0542466>] user_path_at_empty+0x46/0x80
[94630.673017]  [<c092b557>] ? do_page_fault+0x1b7/0x450
[94630.673017]  [<c050c074>] ? remove_vma+0x44/0x60
[94630.673017]  [<c054e233>] ? mntput_no_expire+0x23/0x100
[94630.673017]  [<c0539313>] sys_readlinkat+0x43/0xb0
[94630.673017]  [<c05393ac>] sys_readlink+0x2c/0x30
[94630.673017]  [<c0927ed4>] syscall_call+0x7/0xb

Comment 8 Karel Zak 2012-05-16 22:52:19 UTC
You can try to use "umount --no-canonicalize", but it's workaround.

It seems that we don't have to call mnt_context_prepare_target() (canonicalize the mountpoint) if the mountpoint is already found in mtab (or /proc/self/mountinfo) file. I'll try to optimize the code. Thanks.

Comment 9 Karel Zak 2012-05-17 10:20:03 UTC
Fixed by upstream commit fa705b5441bdb93c36702f7db6c54ec1133bd1cc, the Fefora package(s) will be updated ASAP.

The target path canonicalization is necessary only if you 

  umount /foo/bar-symlink

otherwise we can rely on the fact that all mountpoint paths in /proc/self/mountinfo are already canonicalized by kernel.

Comment 10 Orion Poplawski 2012-05-17 21:10:40 UTC
Excellent!  I built a local version with that patch and can unmount now fine.  Thanks!  It would be great to see this is in F16 soon if possible.

Comment 11 Fedora Update System 2012-05-25 11:48:22 UTC
util-linux-2.21.2-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/util-linux-2.21.2-1.fc17

Comment 12 Fedora Update System 2012-05-26 07:07:40 UTC
Package util-linux-2.21.2-1.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing util-linux-2.21.2-1.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-8440/util-linux-2.21.2-1.fc17
then log in and leave karma (feedback).

Comment 13 Fedora Update System 2012-05-31 00:54:19 UTC
util-linux-2.21.2-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.