Bug 820707 - Impossible to unmount nfsv4/krb5 mounts after network disconnect
Impossible to unmount nfsv4/krb5 mounts after network disconnect
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: util-linux (Show other bugs)
16
All Linux
unspecified Severity medium
: ---
: ---
Assigned To: Karel Zak
Fedora Extras Quality Assurance
http://thread.gmane.org/gmane.linux.n...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-10 14:03 EDT by Orion Poplawski
Modified: 2012-05-30 20:54 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-30 20:54:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Debian BTS 642331 None None None Never

  None (edit)
Description Orion Poplawski 2012-05-10 14:03:35 EDT
Description of problem:

We're using nfs4/krb5 mounts via autofs (although I get the same result without autofs and mounting the directory directly):

earth:/export/home/orion on /home/orion type nfs4 (rw,noatime,vers=4,rsize=32768,wsize=32768,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.10.30.4,minorversion=0,local_lock=none,addr=10.10.10.1)

If the network is disconnected it is impossible to unmount, even if no processes are accessing the mount.  umount -f and umount -l both hang on readlink("/home/orion").  At this point it is impossible to shutdown cleanly and you must hold the power button down until the machine powers off.

Sometimes get:

May 10 12:00:10 makani kernel: [ 2018.272071] nfs: server earth not responding, still trying

but that's it.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.5-5.fc16.i686
3.3.4-3.fc16.i686

How reproducible:
Every time

Steps to Reproduce:
1. mount nfs4/krb5 mount
2. pull network
3. umount -l <mount>
  
Actual results:
umount -l hangs

Expected results:
umount succeeds, perhaps with delay.
Comment 1 Orion Poplawski 2012-05-10 16:09:28 EDT
Can reproduce on F17 as well with:

nfs-utils-1.2.5-15.fc17.x86_64
3.3.4-5.fc17.x86_64

I can unmount non-krb5 nfs4/nfs3 mounts just fine.  Just not nfs4 sec=krb5 ones.
Comment 2 Orion Poplawski 2012-05-15 16:49:53 EDT
The Debian reporter notes and I can confirm that a program like:

#include <sys/mount.h>
int main(int argc, char **argv) {
    umount2(argv[1], MNT_FORCE);
    return 0;
}

works to unmount the directory.  So can we fix readlink() or skip it?
Comment 3 Steve Dickson 2012-05-15 17:54:29 EDT
(In reply to comment #2)
> The Debian reporter notes and I can confirm that a program like:
> 
> #include <sys/mount.h>
> int main(int argc, char **argv) {
>     umount2(argv[1], MNT_FORCE);
>     return 0;
> }
> 
> works to unmount the directory.  So can we fix readlink() or skip it?
So if this works... mount -f should work... at least in theory....
Comment 4 Orion Poplawski 2012-05-15 18:20:28 EDT
In practice umount needs to canonicalize the path:

Breakpoint 1, readlink () at ../sysdeps/unix/syscall-template.S:82
82      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  readlink () at ../sysdeps/unix/syscall-template.S:82
#1  0x0065929f in readlink (__len=4096, __buf=0xbf98b0e9 ",indirect", __path=0xbf98c0ea "/home/orion")
    at /usr/include/bits/unistd.h:151
#2  myrealpath (resolved_path=0xbf98c0ea "/home/orion", path=0x2191ec5b "", maxreslth=<optimized out>)
    at ../../lib/canonicalize.c:96
#3  canonicalize_path (path=0x2191ec50 "/home/orion") at ../../lib/canonicalize.c:177
#4  0x006437e7 in mnt_resolve_path (path=path@entry=0x2191ec50 "/home/orion", cache=0x21920ec0) at cache.c:472
#5  0x00651084 in mnt_context_prepare_target (cxt=cxt@entry=0x2191e878) at context.c:1249
#6  0x006559f8 in mnt_context_prepare_umount (cxt=cxt@entry=0x2191e878) at context_umount.c:606
#7  0x006572a1 in mnt_context_umount (cxt=cxt@entry=0x2191e878) at context_umount.c:753
#8  0x0094a85c in umount_one (spec=<optimized out>, cxt=0x2191e878) at umount.c:273
#9  main (argc=<optimized out>, argv=0xbf98d400) at umount.c:392

It hangs after this.
Comment 5 Orion Poplawski 2012-05-16 16:51:55 EDT
Re-assigning to util-linux since that's where this is happening.  But then we probably need to address the hang in the kernel?
Comment 6 Orion Poplawski 2012-05-16 17:12:57 EDT
This is going to make it impossible to run krb5 nfs4 on laptops where the network can go away at any moment.
Comment 7 Orion Poplawski 2012-05-16 17:29:46 EDT
[94630.673017] umount.nfs      D 0000009c     0 14999  14882 0x00000080
[94630.673017]  c30f5c38 00000086 00000001 0000009c ed110004 1b928142 0000560e 00000000
[94630.673017]  c0c4b180 ed37c000 c0c4b180 f5007180 f6b37110 c32ef110 c30f5c28 f7fd6243
[94630.673017]  c2f9c580 c30f5c20 f7fd9ff2 f82520c0 00000246 c30f5c0c c0927c33 c30f5c30
[94630.673017] Call Trace:
[94630.673017]  [<f7fd6243>] ? xs_sendpages+0x63/0x1f0 [sunrpc]
[94630.673017]  [<f7fd9ff2>] ? __rpc_sleep_on_priority+0x122/0x210 [sunrpc]
[94630.673017]  [<c0927c33>] ? _raw_spin_unlock_bh+0x13/0x20
[94630.673017]  [<c0927c33>] ? _raw_spin_unlock_bh+0x13/0x20
[94630.673017]  [<c0926ed5>] schedule+0x35/0x50
[94630.673017]  [<f7fd96fd>] rpc_wait_bit_killable+0x2d/0x70 [sunrpc]
[94630.673017]  [<c09259a1>] __wait_on_bit+0x51/0x70
[94630.673017]  [<f7fd96d0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[94630.673017]  [<f7fd96d0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[94630.673017]  [<c0925a21>] out_of_line_wait_on_bit+0x61/0x70
[94630.673017]  [<c0455480>] ? autoremove_wake_function+0x50/0x50
[94630.673017]  [<f7fda2e7>] __rpc_execute+0x187/0x2a0 [sunrpc]
[94630.673017]  [<c0455423>] ? wake_up_bit+0x23/0x30
[94630.673017]  [<f7fda548>] rpc_execute+0x38/0x40 [sunrpc]
[94630.673017]  [<f7fd30a9>] rpc_run_task+0x59/0x70 [sunrpc]
[94630.673017]  [<f7fd31bc>] rpc_call_sync+0x3c/0x60 [sunrpc]
[94630.673017]  [<f84aff63>] _nfs4_call_sync+0x23/0x30 [nfs]
[94630.673017]  [<f84afc3e>] _nfs4_proc_getattr+0x8e/0xa0 [nfs]
[94630.673017]  [<f84b385b>] nfs4_proc_getattr+0x3b/0x60 [nfs]
[94630.673017]  [<f849d311>] __nfs_revalidate_inode+0x81/0x210 [nfs]
[94630.673017]  [<f849d5df>] nfs_revalidate_inode+0x2f/0x50 [nfs]
[94630.673017]  [<f8496b3f>] nfs_check_verifier+0x4f/0x80 [nfs]
[94630.673017]  [<f8498ca2>] nfs_lookup_revalidate+0x232/0x450 [nfs]
[94630.673017]  [<c05ead5e>] ? autofs4_d_manage+0x8e/0xf0
[94630.673017]  [<f8499811>] nfs_open_revalidate+0x41/0x220 [nfs]
[94630.673017]  [<c053e79b>] ? follow_managed+0x19b/0x1f0
[94630.673017]  [<c053ff00>] ? unlazy_walk+0xd0/0x180
[94630.673017]  [<c0540153>] ? do_lookup+0x1a3/0x350
[94630.673017]  [<c053f748>] complete_walk+0x88/0xc0
[94630.673017]  [<c0540cc3>] path_lookupat+0x63/0x620
[94630.673017]  [<c0523b89>] ? kmem_cache_alloc+0x29/0x120
[94630.673017]  [<c065a998>] ? strncpy_from_user+0x38/0x70
[94630.673017]  [<c05412aa>] do_path_lookup+0x2a/0xb0
[94630.673017]  [<c0542466>] user_path_at_empty+0x46/0x80
[94630.673017]  [<c092b557>] ? do_page_fault+0x1b7/0x450
[94630.673017]  [<c050c074>] ? remove_vma+0x44/0x60
[94630.673017]  [<c054e233>] ? mntput_no_expire+0x23/0x100
[94630.673017]  [<c0539313>] sys_readlinkat+0x43/0xb0
[94630.673017]  [<c05393ac>] sys_readlink+0x2c/0x30
[94630.673017]  [<c0927ed4>] syscall_call+0x7/0xb
Comment 8 Karel Zak 2012-05-16 18:52:19 EDT
You can try to use "umount --no-canonicalize", but it's workaround.

It seems that we don't have to call mnt_context_prepare_target() (canonicalize the mountpoint) if the mountpoint is already found in mtab (or /proc/self/mountinfo) file. I'll try to optimize the code. Thanks.
Comment 9 Karel Zak 2012-05-17 06:20:03 EDT
Fixed by upstream commit fa705b5441bdb93c36702f7db6c54ec1133bd1cc, the Fefora package(s) will be updated ASAP.

The target path canonicalization is necessary only if you 

  umount /foo/bar-symlink

otherwise we can rely on the fact that all mountpoint paths in /proc/self/mountinfo are already canonicalized by kernel.
Comment 10 Orion Poplawski 2012-05-17 17:10:40 EDT
Excellent!  I built a local version with that patch and can unmount now fine.  Thanks!  It would be great to see this is in F16 soon if possible.
Comment 11 Fedora Update System 2012-05-25 07:48:22 EDT
util-linux-2.21.2-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/util-linux-2.21.2-1.fc17
Comment 12 Fedora Update System 2012-05-26 03:07:40 EDT
Package util-linux-2.21.2-1.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing util-linux-2.21.2-1.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-8440/util-linux-2.21.2-1.fc17
then log in and leave karma (feedback).
Comment 13 Fedora Update System 2012-05-30 20:54:19 EDT
util-linux-2.21.2-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.