Bug 1007607 - systemd fails to reboot/shutdown if the system has a stale NFS handle
systemd fails to reboot/shutdown if the system has a stale NFS handle
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
Depends On: 980088 1007745
  Show dependency treegraph
Reported: 2013-09-12 19:09 EDT by John Schmitt
Modified: 2014-02-12 02:45 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-02-12 02:45:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description John Schmitt 2013-09-12 19:09:08 EDT
Some network event resulted in NFS handles that were "stale".  No combination of killing processes, remounting, restarting services fixed this stale handle.

When all else failed, I initiated a reboot.  This also fails.  On the console, the systems in question keep displaying the following forever:

[...timestamp...] nfs: server my.ip.add.ress not responding, still trying
Unmounting /mnt/myMountPoint
Could not unmount /mnt/myMountPoint: Stale file handle

See also:

Comment 1 Lennart Poettering 2013-09-12 23:36:52 EDT
Well, what are we supposed to do with this? THis hangs in the kernel...
Comment 2 John Schmitt 2013-09-13 05:04:11 EDT
How about limiting the time shutdown should take?  Limiting the number of attempts to unmount?
Comment 3 Lennart Poettering 2013-09-13 16:24:29 EDT
(In reply to John Schmitt from comment #2)
> How about limiting the time shutdown should take? 

Well, we just invoke umount(), and the kernel is then blocking which is something we cannot cancel.

That said we actually turn on the hw watchdog when entering the shutdown phase (if you happen to have one, but almost all systems from the last few years do), so after a long timeout of 10min the machine should simply reset. (THis is configurable via ShutdownWatchdogSec= in system.conf. Note however that this is subject to hw limitation, and a lot of hw can't do such long watchdog timeouts...)

> Limiting the number of
> attempts to unmount?

We do that.
Comment 4 Orion Poplawski 2013-09-13 17:19:24 EDT
One problem wrt netfs - netfs would call umount with '-f -l'.  It does not appear that system uses MNT_FORCE|MNT_DETACH which is going to be necessary in the case of an unreachable nfs server.
Comment 5 John Schmitt 2013-09-16 18:50:02 EDT
I've been trying to take advantage of ShutdownWatchdogSec.  Sadly, my vmware VMs do not have a /dev/watchdog.  I have been able to use 

systemctl reboot --force

Comment 6 John Schmitt 2014-02-12 02:45:53 EST
I no longer see this with Fedora 20 with the 3.12 kernel.

Note You need to log in before you can comment on or make changes to this bug.