Bug 851665

Summary: network mounts autofs leaves around on shutdown need to be unmounted before we get rid of the network
Product: [Fedora] Fedora Reporter: Jason Tibbitts <j>
Component: systemdAssignee: systemd-maint
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: bbuesker, d.bz-redhat, johannbg, leho, lnykryn, lpoetter, marmalodak, metherid, msekleta, orion, plautrba, rvokal, systemd-maint, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-17 23:39:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason Tibbitts 2012-08-24 16:05:31 UTC
If I ssh into an F17 machine (which mounts my home directory via NFS) and issue a reboot, the reboot hangs at "Unmounting file systems."

If I wait a very long time, I'll get "nfs: server foo not responding, still trying".  The network does appear to be taken down at this point as the machine does not respond to pings.  I have left machines like this for hours and they do not appear to ever reboot.

This all renders "reboot" rather pointless; I have to either be in front of the machine to hit the reset button, just hard  
reboot using /proc/sysrq-trigger, or manually make sure that I'm logged in as root and that no NFS filesystems are mounted before rebooting.

This is a fully updated F17 system with systemd-44-17.fc17.x86_64 (kernel-3.5.2-3.fc17.x86_64 and nfs-utils-1.2.6-3.fc17.x86_64, if they matter).

Comment 1 Bill Nottingham 2012-08-24 16:23:23 UTC
I assume, when the FS is mounted, that it doesn't matter whether or not you're logged in?

Comment 2 Bill Nottingham 2012-08-24 16:23:35 UTC
(via ssh, that is)

Comment 3 Jason Tibbitts 2012-08-24 17:51:06 UTC
This is a bit tough to test as I'm pretty sure it's a race.  It doesn't happen every time so it's tough to know whether it just doesn't happen in a particular situation, or just happens with a lower frequency.

I tested by logging in as root, running df /home/tibbs so my directory gets mounted (but I have no running processes) and rebooting.  The machine rebooted fine several times in a row.  I then tried logging in but making sure I was logged out as quickly as possible with
  sudo reboot; exit
Which seems to help; at least the machine rebooted correctly several.  But if the network were slower it might not; I'll need to test that as well.  I'll use the "immediate exit" scheme and see how it works in practice.

Comment 4 Orion Poplawski 2012-08-29 22:46:35 UTC
We're seeing this too with KDE console logins and nfs mounted home dirs.  Clicking logout and then the system gets stuck here.  So, what dependency stuff is needed to get it to unmount before the network is down?  And to forcibly unmount if that fails?  Alas poor netfs init script, we miss you :)

Comment 5 Orion Poplawski 2012-08-29 23:02:31 UTC
This happens on shutdown for us.  There it's added for bugzilla search :).

Comment 6 Lennart Poettering 2012-09-14 09:23:01 UTC
How did you configure your NFS mounts? Are they normal mounts in fstab?

Could you please do me the favour and verify the dependency chain of networking?

Normally the NFS mounts should be before remote-fs.target, so please check this with:

systemctl show -p Before remote-fs.target

Does this properly list your NFS mounts? Similar, can you check that the mounts are after network.target? And then whether the networking service is before network.target?

I presume you use the networking scripts, not NM, right?

Comment 7 Jason Tibbitts 2012-09-14 12:04:26 UTC
My NFS mounts all happen via autofs.  (If it matters, the maps are in ldap.)

I am indeed not using NetworkManager; the machines in question just have static addresses and are always connected.

I'm not sure if that changes the info but you need, but:

> systemctl show -p Before remote-fs.target
Before=systemd-user-sessions.service multi-user.target

> systemctl show -p After network.target
After=arp-ethers.service network.service

> systemctl show -p Before network.target
Before=rc-local.service nss-user-lookup.target autofs.service ntpdate.service sshd.service nfs-lock.service remote-fs-pre.target nss-lookup.target rpcbind.service

Is there some sort of debugging I can turn on so that I can get some idea of what's waiting on what when the shutdown hangs?  With F17 it seems to be harder to reproduce than before, although it happened to me just yesterday.

Comment 8 Lennart Poettering 2012-09-14 13:15:32 UTC
Ah, autofs? Maybe autofs is not ordered properly after network.target? Could you please check what it is ordered after?

Comment 9 Jason Tibbitts 2012-09-14 13:24:14 UTC
Hopefully this is what you're after:

> systemctl show -p After autofs.service
After=network.target ypbind.service systemd-journald.socket basic.target

Still, stopping autofs doesn't umount anything; it only keeps new things from being automounted.  The live NFS mounts created by autofs have to be unmounted as any other.

Comment 10 Lennart Poettering 2012-09-14 13:38:17 UTC
Ah, umm. So autofs doesn't clean up properly. Mounts from fstab are properly ordered, but foreign ones aren't. How should systemd know whether a mount that just appears requires the network, or doesn't? Hmpf...

Comment 11 Jason Tibbitts 2012-09-14 14:18:41 UTC
I don't know what you mean by "doesn't clean up properly"; it is quite important that stopping the automounter leaves mounts in place.  Maybe it could grow some means of being told that it needs to remove everything it created, if that's even possible.

What I'm not understanding is why the originator of the mount even matters; if there's an NFS mount in /proc/mounts, it has to come down before the network or it's not going to come down at all.  If user processes have been killed at this point, why not just throw out a umount -a -t nfs before taking down the network?  I mean, I could have typed "mount" from the command line.

Alternately, just don't wait forever for things to unmount.  Anything has to be better than hanging the machine.

Comment 12 Orion Poplawski 2012-09-14 14:32:11 UTC
Jason is correct about the need for stopping automount leaving mounts in place - it doesn't know whether you are restarting to change options or shutting down.

In the past (F16) there was an init script called "netfs" that was in charge of unmounting all network filesystems (_netdev, nfs, cifs, ncp) on shutdown.  It first tried to umount, then killed users of the mount, umount again, kill again, then finally umount -f -l.  This is nominally what needs to happen on shutdown and needs to be replicated for systemd somehow.  Try to unmount nicely, killing processes if necessary, then as a last resort umount -l so we can shutdown.

Comment 13 Bill Nottingham 2012-09-14 17:48:53 UTC
netfs ran as the service to mount network mounts from fstab (and therefore unmount them, as it mounted them.)

Given that the mounting of network mounts moved to systemd, why wouldn't the umounting be done there?

Comment 14 Orion Poplawski 2012-09-14 17:53:56 UTC
To try to be clear, netfs would unmount *all* network filesystems, not just the ones it mounted.  It looks like perhaps now systemd only unmounts the network filesystems it mounts directly.  But systemd needs to unmount *all* network filesystems on shutdown.

Comment 15 Lennart Poettering 2012-09-14 18:58:21 UTC
systemd will actually unmount all network file systems. But it doesn't order mounts it didn't create itself against remote-fs.target. Maybe it should, dunno. But it's nbot that obvious to do that since a mount might well be special enough so that it should stick around during late shutdown or early bootup...

It's not a question of unmounting these mounts, it's a question of getting the point in time right when they are unmounted.

Comment 16 Bill Nottingham 2012-09-14 19:12:11 UTC
Arguably, any network mount that can be unmounted cleanly at remote-fs.target should be. Is it possible to trace back the fs-level dependencies that any service that ends after remote-fs.target has, and unmount anything not in that tree?

Comment 17 Orion Poplawski 2012-10-06 21:20:50 UTC
Any progress here?

Comment 18 Jason Tibbitts 2013-02-28 18:38:53 UTC
Just to update, this is still an issue for me in F18.  Tried to reboot a couple of different F18 VMs yesterday and they both hung at the umount stage.

Comment 19 Fedora End Of Life 2013-07-04 06:32:57 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 20 Lennart Poettering 2015-06-17 23:39:39 UTC
very recent systemd versions get mount information from util-linux "utab" file which allows us to identify network mounts correctly as long as they either use a typical network file system or the "_netdev" mount option. Closing hence.

Comment 21 Leho Kraav 2016-04-13 18:50:06 UTC
Apologies for gravedigging but this it the most accurate discussion about this problem on the interwebs. I just this experienced failed hung NFS shutdown with systemd-226 and autofs-5.1.1, because network leaves inappropriately early after "systemctl reboot". I checked and there's the _netdev flag on the mount, but it did not help either.

Lennart, when you say "very recent systemd" in mid-2015, should systemd-226 already qualify?

gusto:/mnt/datapool/kernel/pf-kernel.git-e7440 on /mnt/kernel type nfs (rw,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=30,retrans=2,sec=sys,mountaddr=192.168.1.2,mountvers=3,mountport=32767,mountproto=tcp,local_lock=none,addr=192.168.1.2,_netdev)