Bug 1893329 - Mounting NFS home dirs at boot fails
Summary: Mounting NFS home dirs at boot fails
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-30 18:13 UTC by Colin.Simpson
Modified: 2024-02-28 09:39 UTC (History)
14 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-12-05 20:59:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Colin.Simpson 2020-10-30 18:13:31 UTC
Description of problem:

I cannot get an NFS /home too mount from fstab at system boot.

The previous method recommended (I believe) was to add:

srv1.lan:/home				/home		nfs	auto,x-systemd.automount	0 0

I tried adding in x-systemd.after=network-online.target as recommended somewhere, but this made no difference i.e.

srv1.lan:/home				/home		nfs	auto,x-systemd.automount,x-systemd.after=network-online.target	0 0

This mount fail appears to cause other services to fail and the whole system starts slowly:

25.370s udisks2.service                                                          
25.254s upower.service                                                           
25.185s systemd-homed.service                                                    
25.099s polkit.service                                                           
25.034s NetworkManager-wait-online.service                                       
25.023s systemd-hostnamed.service                                                
24.937s accounts-daemon.service                 

This seemed to start somewhere in F32 but the mount still happened. 

I Have tried mounting by IP address in the fstab, but this made no difference.

Mounts straight away after boot with mount /home

Version-Release number of selected component (if applicable):

systemd-246.6-3.fc33.x86_64

How reproducible:
Every time

Steps to Reproduce:
1.Add a mount as I showed above
2.Reboot the system
3.Observe slow startup and no /home

Actual results:

Oct 30 17:58:16 keck systemd-tmpfiles[652]: Detected autofs mount point /home during canonicalization of /home.
Oct 30 17:58:16 keck systemd-tmpfiles[652]: Detected autofs mount point /home during canonicalization of /home.
Oct 30 17:58:16 keck systemd[1]: home.automount: Got automount request for /home, triggered by 728 (systemd-homed)
Oct 30 17:58:41 keck mount[916]: mount.nfs: Failed to resolve server srv1.lan: Name or service not known
Oct 30 17:58:41 keck systemd[1]: home.mount: Mount process exited, code=exited, status=32/n/a
Oct 30 17:58:41 keck systemd[1]: home.mount: Failed with result 'exit-code'.
Oct 30 17:58:41 keck systemd[1]: Failed to mount /home.
Oct 30 17:58:41 keck systemd[800]: systemd-hostnamed.service: Failed to set up mount namespacing: /run/systemd/unit-root/home: No such device
Oct 30 17:58:41 keck systemd[834]: systemd-logind.service: Failed to set up mount namespacing: /run/systemd/unit-root/home: No such device
Oct 30 17:58:45 keck systemd[1]: home.automount: Got automount request for /home, triggered by 1301 (gnome-shell)
Oct 30 17:58:45 keck mount[1656]: mount.nfs: Failed to resolve server srv1.lan: Name or service not known
Oct 30 17:58:45 keck systemd[1]: home.mount: Mount process exited, code=exited, status=32/n/a
Oct 30 17:58:45 keck systemd[1]: home.mount: Failed with result 'exit-code'.
Oct 30 17:58:45 keck systemd[1]: Failed to mount /home.

Network manager is up before this but maybe not fully?



Oct 30 17:58:41 keck NetworkManager[704]: <info>  [1604080721.9937] policy: set 'Bridge 0' (br0) as default for IPv4 routing and DNS
Oct 30 17:58:41 keck NetworkManager[704]: <info>  [1604080721.9942] device (br0): Activation: successful, device activated.
Oct 30 17:58:42 keck NetworkManager[704]: <info>  [1604080722.2075] manager: (virbr0): new Bridge device (/org/freedesktop/NetworkManager/Devices/4)
Oct 30 17:58:42 keck NetworkManager[704]: <info>  [1604080722.2158] manager: (virbr0-nic): new Tun device (/org/freedesktop/NetworkManager/Devices/5)
Oct 30 17:58:42 keck NetworkManager[704]: <info>  [1604080722.2243] device (virbr0-nic): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Oct 30 17:58:42 keck NetworkManager[704]: <info>  [1604080722.2252] device (virbr0-nic): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external')
Oct 30 17:58:42 keck NetworkManager[704]: <info>  [1604080722.3792] device (virbr0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
etc

Later in boot after mount attempt:

Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8722] device (eno1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8730] policy: auto-activating connection 'Ethernet connection 1' (547082dd-5694-489c-aa1c-3aeb8c0ea195)
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8734] device (eno1): Activation: starting connection 'Ethernet connection 1' (547082dd-5694-489c-aa1c-3aeb8c0ea195)
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8735] device (eno1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8741] device (eno1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8745] device (eno1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8755] device (br0): attached bridge port eno1
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8755] device (eno1): Activation: connection 'Ethernet connection 1' enslaved, continuing activation
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8757] device (br0): IPv6 config waiting until carrier is on
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8761] device (eno1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8770] device (br0): carrier: link connected
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8788] device (eno1): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8791] device (eno1): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8800] device (eno1): Activation: successful, device activated.
Oct 30 17:58:46 keck NetworkManager[704]: <info>  [1604080726.8802] manager: startup complete
Oct 30 17:58:48 keck NetworkManager[704]: <info>  [1604080728.8441] policy: set 'Bridge 0' (br0) as default for IPv6 routing and DNS

Expected results:


Additional info:
Maybe complexity from using a bridge device?

Comment 1 Colin.Simpson 2020-11-13 18:32:09 UTC
The failed services with this are:

home.automount                                                                           loaded failed failed    home.automount                                                               
● home.mount                                                                               loaded failed failed    /home                                                                                                                                
● systemd-hostnamed.service                                                                loaded failed failed    Hostname Service            

But my machine seems happy again is I after boot:

systemctl restart home.automount
systemctl restart home.mount
systemctl restart systemd-hostnamed.service
systemctl restart chronyd.service
systemctl restart udisks2.service

but this is a nasty workaround.

It seems if you don't restart udisk2, mounting of drives from nautilus fails.

Comment 2 Colin.Simpson 2021-02-12 14:25:27 UTC
Any update on this?

Comment 3 Steeve McCauley 2021-04-13 21:05:56 UTC
I think I'm seeing the same thing with F33.

This person provided a solution,

https://webby.land/2020/09/29/no-nfs-on-boot/

Editing this file, as root, /lib/systemd/system/remote-fs-pre.target

and add the following under [Unit],

Wants=network-online.target
After=network-online.target

fixed the issue for me.  All of my fstab nfs4 entries had the following config,

nfs4	defaults,soft,_netdev,comment=systemd.automount 0 0

Comment 4 Colin.Simpson 2021-04-14 10:29:28 UTC
Fantastic. This seems to resolve the issue for me too!

And the associated failures of the systemd-hostnamed.service, chronyd.service and udisks2.service.

Now RH can you look at this fix!

Being a RH customer, I dutifully try things out in Fedora so they hopefully don't make it into RHEL...

Comment 5 Steeve McCauley 2021-04-14 10:45:09 UTC
Does seem a no brainer that a remote fs would want network online (or telepath :)

I wonder if this is a regression, I'm sure I've dealt with this before.

Comment 6 Steve 2021-04-27 16:48:51 UTC
F34 seems to have the same bug. Please update Version to 34 since I can't do that.

Comment 7 Colin.Simpson 2021-04-27 21:07:04 UTC
Changed to F34, but the fix from before doesn't seem to be working for me :(

Comment 8 Steve 2021-04-28 03:02:51 UTC
It does not work for me either.

Comment 9 Steeve McCauley 2021-04-28 15:34:49 UTC
Strangely, I'm not seeing this as a problem on F34, but it's a different machine.

And that is with the defaults in /lib/systemd/system/remote-fs-pre.target

that is, no Wants= or After=

Comment 10 Steve 2021-04-29 14:48:04 UTC
'systemctl status mnt-tmp.mount' says:

Apr 29 16:41:44 localhost.localdomain systemd[1]: Mounting /mnt/tmp...
Apr 29 16:41:44 localhost.localdomain mount[875]: mount.nfs: Network is unreachable
Apr 29 16:41:44 localhost.localdomain systemd[1]: mnt-tmp.mount: Mount process exited, code=exited, status=32/n/a
Apr 29 16:41:44 localhost.localdomain systemd[1]: mnt-tmp.mount: Failed with result 'exit-code'.
Apr 29 16:41:44 localhost.localdomain systemd[1]: Failed to mount /mnt/tmp.

It seems that systemd is not waiting for network...

Comment 11 Steve 2021-05-02 18:13:25 UTC
Ok: I see this behavior on only one system after upgrading from F33. No idea why it happens after upgrading. Before it worked without any problem. However I worked it out. Adding the following to /etc/fstab makes mounting on startup possible working again: 'noauto,x-systemd.automount,x-systemd.mount-timeout=30,_netdev'

E.g: 192.168.1.1:/xxxx/xxx         /mnt/xxx        nfs     noauto,x-systemd.automount,x-systemd.mount-timeout=30,_netdev   0 0


Found here: https://bbs.archlinux.org/viewtopic.php?id=254735

Comment 12 Colin.Simpson 2021-05-03 09:43:44 UTC
I now have in my fstab:

server.lan:/home				/home		nfs	noauto,x-systemd.automount,x-systemd.mount-timeout=30,_netdev	0 0

(it's so obvious when you say it  :) )


And tried with and without /lib/systemd/system/remote-fs-pre.target having: 

Wants=network-online.target
After=network-online.target

Neither works for me sadly.

Comment 13 Steeve McCauley 2021-05-03 10:18:28 UTC
Maybe try using the server's ip address instead of its host name?  maybe it's a race condition related to name resolution?  I just tested this on F34 and it works fine.

192.168.1.111:/data/household	/data/household	nfs	defaults,soft,_netdev,comment=x-systemd.automount 0 0
192.168.1.111:/data/audio	/data/audio	nfs	defaults,soft,_netdev,comment=x-systemd.automount 0 0
192.168.1.111:/data/photos	/data/photos	nfs	defaults,soft,_netdev,comment=x-systemd.automount 0 0
192.168.1.111:/data/osd		/data/osd	nfs	defaults,soft,_netdev,comment=x-systemd.automount 0 0

Comment 14 Colin.Simpson 2021-05-03 10:19:31 UTC
Did you need the change to the  /lib/systemd/system/remote-fs-pre.target unit file?

Comment 15 Steve 2021-05-03 10:30:35 UTC
In my case I don't need it.

Comment 16 Steeve McCauley 2021-05-03 10:37:30 UTC
I didn't need to do it on my F34 setup, that was only on F33.  this was the output from 'journalctrl -b' on my fresh F34 install,

May 03 06:17:05 doont systemd[1]: Finished Network Manager Wait Online.
May 03 06:17:05 doont audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-wait-online comm="systemd" exe="/usr/li>
May 03 06:17:05 doont systemd[1]: Reached target Network is Online.
May 03 06:17:05 doont systemd[1]: Reached target Remote File Systems (Pre).
May 03 06:17:05 doont systemd[1]: Mounting /data/audio...
May 03 06:17:05 doont systemd[1]: Mounting /data/household...
May 03 06:17:05 doont systemd[1]: Mounting /data/osd...
May 03 06:17:05 doont systemd[1]: Mounting /data/photos...
May 03 06:17:05 doont systemd[1]: Starting Notify NFS peers of a restart...
May 03 06:17:05 doont sm-notify[1306]: Version 2.5.3 starting
May 03 06:17:05 doont systemd[1]: Started Notify NFS peers of a restart.
May 03 06:17:05 doont audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpc-statd-notify comm="systemd" exe="/usr/lib/systemd/>
May 03 06:17:05 doont kernel: FS-Cache: Loaded
May 03 06:17:05 doont kernel: FS-Cache: Netfs 'nfs' registered for caching
May 03 06:17:05 doont kernel: Key type dns_resolver registered
May 03 06:17:06 doont kernel: NFS: Registering the id_resolver key type
May 03 06:17:06 doont kernel: Key type id_resolver registered
May 03 06:17:06 doont kernel: Key type id_legacy registered
May 03 06:17:06 doont systemd[1]: Mounted /data/osd.
May 03 06:17:06 doont systemd[1]: Mounted /data/photos.
May 03 06:17:06 doont systemd[1]: Mounted /data/household.
May 03 06:17:06 doont systemd[1]: Mounted /data/audio.
May 03 06:17:06 doont systemd[1]: Reached target Remote File Systems.

Comment 17 Colin.Simpson 2021-05-03 13:35:41 UTC
Still fails for me if using the IP.

May 03 14:30:56 tstf34 mount[962]: mount.nfs: Network is unreachable
May 03 14:30:56 tstf34 systemd[1]: home.mount: Mount process exited, code=exited, status=32/n/a
May 03 14:30:56 tstf34 systemd[1]: home.mount: Failed with result 'exit-code'.
May 03 14:30:56 tstf34 systemd[1]: Failed to mount /home.

Though I am using IPv6.

Comment 18 Steve 2021-11-09 09:11:16 UTC
Please change Version to 35, since I can't do that.

Comment 19 Ben Cotton 2022-11-29 16:50:02 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 20 Colin.Simpson 2022-11-29 17:37:04 UTC
This is still broken in F37.
Still takes a long time to boot (with an NFS homedir partition mount in /etc/fstab) and chronyd fails to start. udisks2 despite claiming to load doesn't work. 

In case anyone else needs I'm working around with this nasty:

In /etc/systemd/system/fixbrokenhomedir.service:

[Unit]
Description=Fix broken mount of homedir on boot and restart services that this hurt
Requires=network-online.target

[Service]
Type=oneshot
ExecStart=/bin/systemctl restart home.automount
ExecStart=/bin/systemctl restart home.mount
ExecStart=/bin/systemctl restart chronyd.service
ExecStart=/bin/systemctl restart systemd-hostnamed.service
ExecStart=/bin/systemctl restart udisks2.service

And then a timer to launch this after boot:

In /etc/systemd/system/fixbrokenhomedir.timer:

[Unit]
Description=Fix broken mount of homedir on boot and restart services that this hurt

[Timer]
OnBootSec=200

[Install]
WantedBy=timers.target

Enabled with

systemctl enable fixbrokenhomedir.timer

Just to note (in case there is now a preferable option):
nfssrv.lan:/home				/home		nfs	noauto,x-systemd.automount,x-sy
stemd.mount-timeout=30,_netdev	0 0

Comment 21 Aoife Moloney 2023-11-23 00:03:58 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 22 Steve 2023-11-23 15:07:12 UTC
This bug is still present in Fedora 39.

Comment 23 Aoife Moloney 2023-12-05 20:59:55 UTC
Fedora Linux 37 entered end-of-life (EOL) status on None.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 24 Johannes Pfau 2024-02-27 12:32:49 UTC
As already mentioned, thebug is still present in Fedora 39. I don't really know how to reopen the report though.

The issue seems to be a kind-of dependency loop involving home.mount, the remote fs target, network-pre.target, systemd-homed and the account-daemon service and NetworkManager. Probably also the network-online target.
I'm not really sure how to obtain detailed information about the cycle, as the system does not boot at all when /home is specified as a nfs mount in fstab.

None of the workarounds mentioned here was working for me. However, in my case I'm also booting from a NFS root filesystem. This means that the initramfs already setup the network anyway and the network is fully up when systemd starts the boot process. Because of that, waiting for network-pre in the NetworkManager unit seemed pointless. With that removed, the system is at least able to boot.

Comment 25 Johannes Pfau 2024-02-28 09:39:31 UTC
Here's what I found with some debugging:

Polkit seems to start only after home has been mounted, although there is no systemd dependency. This is the dependency chain:
home.automount (not a systemd dependency) => polkit.service => firewalld.service => network-pre.target => NetworkManager.service => network.target => gssproxy.service => rpc-gssd.service => nfs-client.target => remote-fs-pre.target => remote-fs.target => home.mount


So as there's no real dependency in the systemd graph, systemd does not detect the cycle and can't break it. Various units in that cycle however do simply not start and time out. Systemd then increases the timeout and the boot never succeeds.
I guess the simplest workaround is to remove the network-pre.target dependency on firewalld. I'm not sure whats the real solution here though. Maybe polkit should not require /home or maybe it should somehow be possible to avoid triggering the auto mount early in the boot process.


As a workaround, using a normal mount for home instead of an automount also fixes this. Compared to the workaround I suggested previously it significantly speeds up the boot process (10s vs 45s in the critical chain) and it should also work if your netup is not already set up by the initramfs.


Note You need to log in before you can comment on or make changes to this bug.