Bug 1007607

Summary: systemd fails to reboot/shutdown if the system has a stale NFS handle
Product: [Fedora] Fedora Reporter: John Schmitt <marmalodak>
Component: systemdAssignee: systemd-maint
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: d.bz-redhat, johannbg, lnykryn, msekleta, orion, plautrba, rdieter, samuel-rhbugs, systemd-maint, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-12 07:45:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 980088, 1007745    
Bug Blocks:    

Description John Schmitt 2013-09-12 23:09:08 UTC
Some network event resulted in NFS handles that were "stale".  No combination of killing processes, remounting, restarting services fixed this stale handle.

When all else failed, I initiated a reboot.  This also fails.  On the console, the systems in question keep displaying the following forever:

[...timestamp...] nfs: server my.ip.add.ress not responding, still trying
Unmounting /mnt/myMountPoint
Could not unmount /mnt/myMountPoint: Stale file handle

See also:

https://bugzilla.redhat.com/show_bug.cgi?id=851665
https://bugzilla.redhat.com/show_bug.cgi?id=750926

Comment 1 Lennart Poettering 2013-09-13 03:36:52 UTC
Well, what are we supposed to do with this? THis hangs in the kernel...

Comment 2 John Schmitt 2013-09-13 09:04:11 UTC
How about limiting the time shutdown should take?  Limiting the number of attempts to unmount?

Comment 3 Lennart Poettering 2013-09-13 20:24:29 UTC
(In reply to John Schmitt from comment #2)
> How about limiting the time shutdown should take? 

Well, we just invoke umount(), and the kernel is then blocking which is something we cannot cancel.

That said we actually turn on the hw watchdog when entering the shutdown phase (if you happen to have one, but almost all systems from the last few years do), so after a long timeout of 10min the machine should simply reset. (THis is configurable via ShutdownWatchdogSec= in system.conf. Note however that this is subject to hw limitation, and a lot of hw can't do such long watchdog timeouts...)

> Limiting the number of
> attempts to unmount?

We do that.

Comment 4 Orion Poplawski 2013-09-13 21:19:24 UTC
One problem wrt netfs - netfs would call umount with '-f -l'.  It does not appear that system uses MNT_FORCE|MNT_DETACH which is going to be necessary in the case of an unreachable nfs server.

Comment 5 John Schmitt 2013-09-16 22:50:02 UTC
I've been trying to take advantage of ShutdownWatchdogSec.  Sadly, my vmware VMs do not have a /dev/watchdog.  I have been able to use 

systemctl reboot --force

though.

Comment 6 John Schmitt 2014-02-12 07:45:53 UTC
I no longer see this with Fedora 20 with the 3.12 kernel.