Red Hat Bugzilla – Bug 1007607
systemd fails to reboot/shutdown if the system has a stale NFS handle
Last modified: 2014-02-12 02:45:53 EST
Some network event resulted in NFS handles that were "stale". No combination of killing processes, remounting, restarting services fixed this stale handle.
When all else failed, I initiated a reboot. This also fails. On the console, the systems in question keep displaying the following forever:
[...timestamp...] nfs: server my.ip.add.ress not responding, still trying
Could not unmount /mnt/myMountPoint: Stale file handle
Well, what are we supposed to do with this? THis hangs in the kernel...
How about limiting the time shutdown should take? Limiting the number of attempts to unmount?
(In reply to John Schmitt from comment #2)
> How about limiting the time shutdown should take?
Well, we just invoke umount(), and the kernel is then blocking which is something we cannot cancel.
That said we actually turn on the hw watchdog when entering the shutdown phase (if you happen to have one, but almost all systems from the last few years do), so after a long timeout of 10min the machine should simply reset. (THis is configurable via ShutdownWatchdogSec= in system.conf. Note however that this is subject to hw limitation, and a lot of hw can't do such long watchdog timeouts...)
> Limiting the number of
> attempts to unmount?
We do that.
One problem wrt netfs - netfs would call umount with '-f -l'. It does not appear that system uses MNT_FORCE|MNT_DETACH which is going to be necessary in the case of an unreachable nfs server.
I've been trying to take advantage of ShutdownWatchdogSec. Sadly, my vmware VMs do not have a /dev/watchdog. I have been able to use
systemctl reboot --force
I no longer see this with Fedora 20 with the 3.12 kernel.