Red Hat Bugzilla – Bug 851665
network mounts autofs leaves around on shutdown need to be unmounted before we get rid of the network
Last modified: 2016-06-26 10:35:39 EDT
If I ssh into an F17 machine (which mounts my home directory via NFS) and issue a reboot, the reboot hangs at "Unmounting file systems."
If I wait a very long time, I'll get "nfs: server foo not responding, still trying". The network does appear to be taken down at this point as the machine does not respond to pings. I have left machines like this for hours and they do not appear to ever reboot.
This all renders "reboot" rather pointless; I have to either be in front of the machine to hit the reset button, just hard
reboot using /proc/sysrq-trigger, or manually make sure that I'm logged in as root and that no NFS filesystems are mounted before rebooting.
This is a fully updated F17 system with systemd-44-17.fc17.x86_64 (kernel-3.5.2-3.fc17.x86_64 and nfs-utils-1.2.6-3.fc17.x86_64, if they matter).
I assume, when the FS is mounted, that it doesn't matter whether or not you're logged in?
(via ssh, that is)
This is a bit tough to test as I'm pretty sure it's a race. It doesn't happen every time so it's tough to know whether it just doesn't happen in a particular situation, or just happens with a lower frequency.
I tested by logging in as root, running df /home/tibbs so my directory gets mounted (but I have no running processes) and rebooting. The machine rebooted fine several times in a row. I then tried logging in but making sure I was logged out as quickly as possible with
sudo reboot; exit
Which seems to help; at least the machine rebooted correctly several. But if the network were slower it might not; I'll need to test that as well. I'll use the "immediate exit" scheme and see how it works in practice.
We're seeing this too with KDE console logins and nfs mounted home dirs. Clicking logout and then the system gets stuck here. So, what dependency stuff is needed to get it to unmount before the network is down? And to forcibly unmount if that fails? Alas poor netfs init script, we miss you :)
This happens on shutdown for us. There it's added for bugzilla search :).
How did you configure your NFS mounts? Are they normal mounts in fstab?
Could you please do me the favour and verify the dependency chain of networking?
Normally the NFS mounts should be before remote-fs.target, so please check this with:
systemctl show -p Before remote-fs.target
Does this properly list your NFS mounts? Similar, can you check that the mounts are after network.target? And then whether the networking service is before network.target?
I presume you use the networking scripts, not NM, right?
My NFS mounts all happen via autofs. (If it matters, the maps are in ldap.)
I am indeed not using NetworkManager; the machines in question just have static addresses and are always connected.
I'm not sure if that changes the info but you need, but:
> systemctl show -p Before remote-fs.target
> systemctl show -p After network.target
> systemctl show -p Before network.target
Before=rc-local.service nss-user-lookup.target autofs.service ntpdate.service sshd.service nfs-lock.service remote-fs-pre.target nss-lookup.target rpcbind.service
Is there some sort of debugging I can turn on so that I can get some idea of what's waiting on what when the shutdown hangs? With F17 it seems to be harder to reproduce than before, although it happened to me just yesterday.
Ah, autofs? Maybe autofs is not ordered properly after network.target? Could you please check what it is ordered after?
Hopefully this is what you're after:
> systemctl show -p After autofs.service
After=network.target ypbind.service systemd-journald.socket basic.target
Still, stopping autofs doesn't umount anything; it only keeps new things from being automounted. The live NFS mounts created by autofs have to be unmounted as any other.
Ah, umm. So autofs doesn't clean up properly. Mounts from fstab are properly ordered, but foreign ones aren't. How should systemd know whether a mount that just appears requires the network, or doesn't? Hmpf...
I don't know what you mean by "doesn't clean up properly"; it is quite important that stopping the automounter leaves mounts in place. Maybe it could grow some means of being told that it needs to remove everything it created, if that's even possible.
What I'm not understanding is why the originator of the mount even matters; if there's an NFS mount in /proc/mounts, it has to come down before the network or it's not going to come down at all. If user processes have been killed at this point, why not just throw out a umount -a -t nfs before taking down the network? I mean, I could have typed "mount" from the command line.
Alternately, just don't wait forever for things to unmount. Anything has to be better than hanging the machine.
Jason is correct about the need for stopping automount leaving mounts in place - it doesn't know whether you are restarting to change options or shutting down.
In the past (F16) there was an init script called "netfs" that was in charge of unmounting all network filesystems (_netdev, nfs, cifs, ncp) on shutdown. It first tried to umount, then killed users of the mount, umount again, kill again, then finally umount -f -l. This is nominally what needs to happen on shutdown and needs to be replicated for systemd somehow. Try to unmount nicely, killing processes if necessary, then as a last resort umount -l so we can shutdown.
netfs ran as the service to mount network mounts from fstab (and therefore unmount them, as it mounted them.)
Given that the mounting of network mounts moved to systemd, why wouldn't the umounting be done there?
To try to be clear, netfs would unmount *all* network filesystems, not just the ones it mounted. It looks like perhaps now systemd only unmounts the network filesystems it mounts directly. But systemd needs to unmount *all* network filesystems on shutdown.
systemd will actually unmount all network file systems. But it doesn't order mounts it didn't create itself against remote-fs.target. Maybe it should, dunno. But it's nbot that obvious to do that since a mount might well be special enough so that it should stick around during late shutdown or early bootup...
It's not a question of unmounting these mounts, it's a question of getting the point in time right when they are unmounted.
Arguably, any network mount that can be unmounted cleanly at remote-fs.target should be. Is it possible to trace back the fs-level dependencies that any service that ends after remote-fs.target has, and unmount anything not in that tree?
Any progress here?
Just to update, this is still an issue for me in F18. Tried to reboot a couple of different F18 VMs yesterday and they both hung at the umount stage.
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '17'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 17's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 17 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged change the
'version' to a later Fedora version prior to Fedora 17's end of life.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
very recent systemd versions get mount information from util-linux "utab" file which allows us to identify network mounts correctly as long as they either use a typical network file system or the "_netdev" mount option. Closing hence.
Apologies for gravedigging but this it the most accurate discussion about this problem on the interwebs. I just this experienced failed hung NFS shutdown with systemd-226 and autofs-5.1.1, because network leaves inappropriately early after "systemctl reboot". I checked and there's the _netdev flag on the mount, but it did not help either.
Lennart, when you say "very recent systemd" in mid-2015, should systemd-226 already qualify?
gusto:/mnt/datapool/kernel/pf-kernel.git-e7440 on /mnt/kernel type nfs (rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=30,retrans=2,sec=sys,mountaddr=192.168.1.2,mountvers=3,mountport=32767,mountproto=tcp,local_lock=none,addr=192.168.1.2,_netdev)