Bug 888942

Summary:	umount(,MNT_DETACH) for nfsv4 hangs when using sec=krb5 and network is down
Product:	[Fedora] Fedora	Reporter:	Orion Poplawski <orion>
Component:	nfs-utils	Assignee:	Steve Dickson <steved>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	18	CC:	bfields, dkholia, jlayton, rolek, steved
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-02-05 14:03:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Orion Poplawski 2012-12-19 20:45:42 UTC

Description of problem:

nfs4 mounts with sec=krb5 cannot be unmounted with the network down, even with umount -l because umount() with MNT_DETACH set will hang, presumably somewhere in the gss stack.

A successful umount yields the following packet trace:

  1 0.000000000   10.10.20.2 -> 10.10.10.1   NFS 218 V4 Call GETATTR FH:0x3b470ee7
  2 0.000236000   10.10.10.1 -> 10.10.20.2   NFS 318 V4 Reply (Call In 1) GETATTR
  3 0.000282000   10.10.20.2 -> 10.10.10.1   TCP 66 943 > nfs [ACK] Seq=153 Ack=253 Win=331 Len=0 TSval=3468186 TSecr=878557922
  4 0.008761000   10.10.20.2 -> 10.10.10.1   TCP 66 943 > nfs [FIN, ACK] Seq=153 Ack=253 Win=331 Len=0 TSval=3468195 TSecr=878557922
  5 0.008923000   10.10.10.1 -> 10.10.20.2   TCP 66 nfs > 943 [FIN, ACK] Seq=253 Ack=154 Win=683 Len=0 TSval=878557930 TSecr=3468195
  6 0.008970000   10.10.20.2 -> 10.10.10.1   TCP 66 943 > nfs [ACK] Seq=154 Ack=254 Win=331 Len=0 TSval=3468195 TSecr=878557930

So my guess is that something in the gss stack is preventing the GETATTR call from succeeding as unmounting succeeds without sec=krb5.  Although running rpc.gssd and rpcidmap with -vvvv does not appear to produce any output.  A successful unmount produces:

Dec 19 13:42:44 orca rpc.gssd[18495]: destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt27
Dec 19 13:42:44 orca rpc.gssd[18495]: destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt24

However, we need someway to be able to drop mounts after the network connection has been removed.  This behavior is causing sever problems for our laptop and vpn users.

Tested with:

3.6.11-3.fc18
nfs-utils-1.2.7-2.fc18

Comment 1 Orion Poplawski 2012-12-19 21:53:00 UTC

[  131.832005] umount.nfs4     D f1585bc8     0  1959   1958 0x00000080
[  131.832005]  f1585c34 00000086 0000ea8a f1585bc8 c045a297 f705f110 644b6440 0000001c
[  131.832005]  f1585bd8 c0cd5080 c0cd5080 00000282 f1585c00 f7591080 f3a27110 f1585c24
[  131.832005]  00000000 c0d2e280 00000282 00000246 f1585c00 c097a273 f1585c2c f7ee11c5
[  131.832005] Call Trace:
[  131.832005]  [<c045a297>] ? __internal_add_timer+0x77/0xc0
[  131.832005]  [<c097a273>] ? _raw_spin_unlock_bh+0x13/0x20
[  131.832005]  [<f7ee11c5>] ? rpc_wake_up_first+0x65/0x180 [sunrpc]
[  131.832005]  [<f7eda240>] ? rpc_show_tasks+0x1b0/0x1b0 [sunrpc]
[  131.832005]  [<c09794d3>] schedule+0x23/0x60
[  131.832005]  [<f7ee064d>] rpc_wait_bit_killable+0x2d/0x70 [sunrpc]
[  131.832005]  [<c0977fc1>] __wait_on_bit+0x51/0x70
[  131.832005]  [<f7ee0620>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[  131.832005]  [<f7ee0620>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[  131.832005]  [<c0978041>] out_of_line_wait_on_bit+0x61/0x70
[  131.832005]  [<c046c100>] ? autoremove_wake_function+0x50/0x50
[  131.832005]  [<f7ee198f>] __rpc_execute+0x11f/0x340 [sunrpc]
[  131.832005]  [<c0507774>] ? mempool_alloc+0x44/0x120
[  131.832005]  [<f7ed8a50>] ? call_connect+0x90/0x90 [sunrpc]
[  131.832005]  [<f7ed8a50>] ? call_connect+0x90/0x90 [sunrpc]
[  131.832005]  [<c046c0a3>] ? wake_up_bit+0x23/0x30
[  131.832005]  [<f7ee1ec8>] rpc_execute+0x48/0x80 [sunrpc]
[  131.832005]  [<f7ed9929>] rpc_run_task+0x59/0x70 [sunrpc]
[  131.832005]  [<f7ed9a3c>] rpc_call_sync+0x3c/0x60 [sunrpc]
[  131.832005]  [<f8a402fc>] _nfs4_call_sync+0x3c/0x50 [nfsv4]
[  131.832005]  [<f8a403d5>] _nfs4_proc_getattr+0x95/0xa0 [nfsv4]
[  131.832005]  [<f8a41bab>] nfs4_proc_getattr+0x3b/0x60 [nfsv4]
[  131.832005]  [<f897f891>] __nfs_revalidate_inode+0x81/0x210 [nfs]
[  131.832005]  [<f897fbd2>] nfs_revalidate_inode+0x62/0x90 [nfs]
[  131.832005]  [<f89793ef>] nfs_check_verifier+0x4f/0x80 [nfs]
[  131.832005]  [<f897b4da>] nfs_lookup_revalidate+0x2ba/0x440 [nfs]
[  131.832005]  [<c055f8cb>] ? follow_managed+0x19b/0x200
[  131.832005]  [<c0560000>] ? unlazy_walk+0xf0/0x1a0
[  131.832005]  [<f897c184>] nfs4_lookup_revalidate+0x34/0xe0 [nfs]
[  131.832005]  [<c055fedc>] complete_walk+0x8c/0xc0
[  131.832005]  [<c05611b3>] path_lookupat+0x63/0x650
[  131.832005]  [<c05617ca>] do_path_lookup+0x2a/0xb0
[  131.832005]  [<c0563df6>] user_path_at_empty+0x46/0x80
[  131.832005]  [<c097d440>] ? vmalloc_fault+0x176/0x176
[  131.832005]  [<c097d5f7>] ? do_page_fault+0x1b7/0x450
[  131.832005]  [<c0563e4f>] user_path_at+0x1f/0x30
[  131.832005]  [<c05707b1>] sys_umount+0x41/0x340
[  131.832005]  [<c04bd59c>] ? __audit_syscall_entry+0xbc/0x290
[  131.832005]  [<c04bdac6>] ? __audit_syscall_exit+0x356/0x3b0
[  131.832005]  [<c0980fdf>] sysenter_do_call+0x12/0x28

[  131.832005] rpc.gssd        S c0c119c0     0   416      1 0x00000080
[  131.832005]  f302db34 00000086 c0c11de4 c0c119c0 8d14abd2 c0bb9220 7ca9fac9 00000017
[  131.832005]  00000000 c0cd5080 c0cd5080 00000000 f302dafc f7583080 f3249920 00000000
[  131.832005]  0000007e 00000000 00000000 f30f9000 f30dad00 f302db7c f7f38ca8 0000007e
[  131.832005] Call Trace:
[  131.832005]  [<f7f38ca8>] ? e1000_xmit_frame+0x848/0xda0 [e1000]
[  131.832005]  [<c09794d3>] schedule+0x23/0x60
[  131.832005]  [<c09787e5>] schedule_hrtimeout_range_clock+0x145/0x160
[  131.832005]  [<c045a297>] ? __internal_add_timer+0x77/0xc0
[  131.832005]  [<c045a9aa>] ? internal_add_timer+0x1a/0x40
[  131.832005]  [<c045b98e>] ? mod_timer_pending+0x5e/0x140
[  131.832005]  [<c046c409>] ? add_wait_queue+0x39/0x50
[  131.832005]  [<c0978817>] schedule_hrtimeout_range+0x17/0x20
[  131.832005]  [<c0566a87>] poll_schedule_timeout+0x37/0x60
[  131.832005]  [<c0567c80>] do_sys_poll+0x310/0x430
[  131.832005]  [<c0566b80>] ? __pollwait+0xd0/0xd0
[  131.832005]  [<c0566b80>] ? __pollwait+0xd0/0xd0
[  131.832005]  [<c08b9eee>] ? ip_finish_output+0x15e/0x390
[  131.832005]  [<c08b9d90>] ? ip_fragment+0x830/0x830
[  131.832005]  [<c08baa5a>] ? ip_output+0x8a/0xd0
[  131.832005]  [<c08b9d90>] ? ip_fragment+0x830/0x830
[  131.832005]  [<c08ba200>] ? ip_local_out+0x20/0x30
[  131.832005]  [<c08ba33f>] ? ip_queue_xmit+0x12f/0x3b0
[  131.832005]  [<c054ba7d>] ? lookup_page_cgroup_used+0xd/0x30
[  131.832005]  [<c0550dbd>] ? mem_cgroup_bad_page_check+0x1d/0x30
[  131.832005]  [<c050b5fa>] ? free_pages_prepare+0x7a/0x160
[  131.832005]  [<c0527093>] ? unmap_single_vma+0x363/0x750
[  131.832005]  [<c050baea>] ? free_hot_cold_page_list+0x3a/0x80
[  131.832005]  [<c050ff0a>] ? release_pages+0x19a/0x210
[  131.832005]  [<c0537499>] ? free_pages_and_swap_cache+0x99/0xb0
[  131.832005]  [<c0525beb>] ? tlb_flush_mmu+0x5b/0x90
[  131.832005]  [<c0525c37>] ? tlb_finish_mmu+0x17/0x40
[  131.832005]  [<c052c2bd>] ? unmap_region+0x8d/0xb0
[  131.832005]  [<c045d367>] ? recalc_sigpending+0x17/0x50
[  131.832005]  [<c045db85>] ? __set_task_blocked+0x35/0x80
[  131.832005]  [<c045fda3>] ? __set_current_blocked+0x33/0x50
[  131.832005]  [<c045fe1d>] ? sigprocmask+0x5d/0xc0
[  131.832005]  [<c0567f52>] sys_ppoll+0x92/0x150
[  131.832005]  [<c04bdac6>] ? __audit_syscall_exit+0x356/0x3b0
[  131.832005]  [<c0980fdf>] sysenter_do_call+0x12/0x28

Comment 2 Dhiru Kholia 2013-10-12 07:45:22 UTC

I am running Fedora 20 and I see "sec=sys" (instead of "sec=krb5") but I am running into a similar situation.

$ sudo strace -f umount -f -l /mnt/stuff

...
[pid 13157] umount("/mnt/stuff", MNT_DETACH

The command hangs at this point.

Comment 3 Roel van Meer 2013-12-18 13:52:13 UTC

Although I'm not running Fedora, I think I had the same problem.

Applying the following commit in my kernel (3.10.24) fixed it for me.

commit 8033426e6bdb2690d302872ac1e1fadaec1a5581
Author: Jeff Layton <jlayton>
Date:   Fri Jul 26 06:23:25 2013 -0400

    vfs: allow umount to handle mountpoints without revalidating them

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=8033426e6bdb2690d302872ac1e1fadaec1a5581

Comment 4 Fedora End Of Life 2013-12-21 09:59:50 UTC

This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Fedora End Of Life 2014-02-05 14:03:34 UTC

Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.