Bug 749940

Summary: NFS mount fails
Product: [Fedora] Fedora Reporter: Andrew McNabb <amcnabb>
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: erempel, johannbg, lpoetter, metherid, mschmidt, notting, plautrba, systemd-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-37-3.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 750032 (view as bug list) Environment:
Last Closed: 2011-11-10 17:44:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
boot.log none

Description Andrew McNabb 2011-10-28 23:19:00 UTC
Description of problem:

I have an NFS mount in /etc/fstab called /aml. On my Fedora 16 Beta machine, the systemd unit for this mount fails at startup, and I can't find any indication of what's wrong. In /var/log/boot.log, I see the following:

Starting /aml ESC[1;31mfailedESC[0m, see 'systemctl status aml.mount' for details.

If I run "systemctl status aml.mount", it gives:

aml.mount - /aml
	  Loaded: loaded
	  Active: failed since Fri, 28 Oct 2011 16:42:45 -0600; 11min ago
	   Where: /aml
	    What: nfs:/
	 Process: 372 ExecMount=/bin/mount /aml (code=exited, status=32)
	  CGroup: name=systemd:/system/aml.mount

Unfortunately, I can't find any logs that show what the specific error message is. Additionally, if I mount the filesystem manually with "mount /aml", it succeeds.


Version-Release number of selected component (if applicable):

systemd-37-2.fc16.x86_64


How reproducible:
every boot


Steps to Reproduce:
1. Add an NFS mount to /etc/fstab.
2. Boot.

  
Actual results:

The mount unit fails.


Expected results:

The filesystem should be mounted.


Additional info:

Because of this problem, NFS home directories are unavailable without manual intervention. If it helps, the specific line in fstab is:

nfs:/ /aml nfs4 noatime,intr,actimeo=0 0 0

Comment 1 Andrew McNabb 2011-10-30 06:37:00 UTC
Created attachment 530817 [details]
boot.log

From boot.log, it's clear that systemd is trying to mount the NFS filesystem (/aml) just after udev starts. Since this is an NFS filesystem, there is no way that it can possibly succeed in mounting until after the "Bring up/down networking" service completes. The log also shows that it tries to start "Mount and unmount network filesystems" before starting "Bring up/down networking". It looks like there are some weird problems with the chain of dependencies.

There are some similarities with bug #692008. However, two differences are: 1) my machine does not have any problems starting the network; and 2) I have SELinux disabled. Thus I do not think this is a duplicate.

One possibly important thing to note is that I'm using the network service instead of NetworkManager.

Is there any other information that would be helpful? Thanks.

Comment 2 Michal Schmidt 2011-10-31 10:02:33 UTC
Probably it's the problem described here:
http://lists.freedesktop.org/archives/systemd-devel/2011-October/003661.html

Does "systemctl enable remote-fs-pre.target" help?

Comment 3 Andrew McNabb 2011-10-31 15:56:03 UTC
Running "systemctl enable remote-fs-pre.target" seems to help. Shouldn't this be enabled by default?

Comment 4 Andrew McNabb 2011-10-31 16:33:45 UTC
Also, I think it would be helpful to add this to the Fedora 16 Common Bugs. If the /etc/fstab has an entry for an NFS filesystem, this filesystem will not be mounted at boot. Home directories on NFS are a common use case, so many systems will not have home directories available at boot, and so users will not be able to log in. The command from comment #2 is a simple workaround.

Comment 5 Michal Schmidt 2011-10-31 16:52:59 UTC
(In reply to comment #3)
> Running "systemctl enable remote-fs-pre.target" seems to help. Shouldn't this
> be enabled by default?

Lennart said he needs to rethink it.

(In reply to comment #4)
> Also, I think it would be helpful to add this to the Fedora 16 Common Bugs

No. This is a regression in systemd-37 which is still only in updates-testing. We should fix it before pushing it to stable.

Comment 6 Lennart Poettering 2011-11-01 21:30:46 UTC
Fixed upstream.

Comment 7 Andrew McNabb 2011-11-01 22:04:20 UTC
(In reply to comment #6)
> Fixed upstream.

Is there an updated RPM that you would like me to test? The latest one I see in koji is systemd-37-2.

Comment 8 Fedora Update System 2011-11-01 23:52:08 UTC
systemd-37-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/FEDORA-2011-14239

Comment 9 Fedora Update System 2011-11-02 17:54:46 UTC
Package systemd-37-3.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-3.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-14239
then log in and leave karma (feedback).

Comment 10 Fedora Update System 2011-11-10 17:44:37 UTC
systemd-37-3.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 11 Evan Rempel 2012-01-07 04:35:09 UTC
I am still having problems with F16 and 

[erempel@speedy ~]$ rpm -qa | grep systemd
systemd-37-3.fc16.x86_64
systemd-sysv-37-3.fc16.x86_64
systemd-units-37-3.fc16.x86_64

Without the NFS filespaces automounting, I can actually log in and issue ifconfig and show that I don't even have an IP address from DHCP negotiation.

I'm not new to linux but I don't understand systemd yet.


$ sudo systemctl enable NetworkManager-wait-online.service


$ sudo systemctl status NetworkManager-wait-online.service
NetworkManager-wait-online.service - Network Manager Wait Online
	  Loaded: loaded (/lib/systemd/system/NetworkManager-wait-online.service; enabled)
	  Active: failed since Fri, 06 Jan 2012 20:26:55 -0800; 5min ago
	Main PID: 1032 (code=exited, status=1/FAILURE)
	  CGroup: name=systemd:/system/NetworkManager-wait-online.service

$ sudo systemctl restart  NetworkManager-wait-online.service
$ sudo systemctl status NetworkManager-wait-online.service
NetworkManager-wait-online.service - Network Manager Wait Online
	  Loaded: loaded (/lib/systemd/system/NetworkManager-wait-online.service; enabled)
	  Active: inactive (dead) since Fri, 06 Jan 2012 20:34:00 -0800; 5s ago
	 Process: 1991 ExecStart=/usr/bin/nm-online -q --timeout=30 (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/NetworkManager-wait-online.service

suggestions

Comment 12 Michal Schmidt 2012-01-09 13:52:32 UTC
(In reply to comment #11)
>    Active: failed since Fri, 06 Jan 2012 20:26:55 -0800; 5min ago
>  Main PID: 1032 (code=exited, status=1/FAILURE)

Any hints in /var/log/messages or dmesg about why it failed?

Comment 13 Evan Rempel 2012-01-09 16:33:43 UTC
It looks NetworkManager-wait-online.service only waits for 30 seconds, but the DHCP client takes 36 seconds to acquire an address. NetworkManager seems to think that it has a 45 second timeout window, but if NetworkManager-wait-online.service only waits for 30 seconds, this seems inappropriate.

Isn't the purpose of NetworkManager-wait-online.service to wait until the network is actually up rather than just a timeout.


Jan  9 06:57:59 speedy NetworkManager[965]: NetworkManager[965]: <info> Activation (p5p1) Beginning DHCPv4 transaction (timeout in 45 seconds)
...
Jan  9 06:58:28 speedy systemd[1]: NetworkManager-wait-online.service: main process exited, code=exited, status=1
Jan  9 06:58:28 speedy systemd[1]: Unit NetworkManager-wait-online.service entered failed state.
Jan  9 06:58:29 speedy rpc.statd[1279]: Version 1.2.5 starting
Jan  9 06:58:29 speedy sm-notify[1280]: Version 1.2.5 starting
Jan  9 06:58:29 speedy systemd[1]: Failed to read PID file /run/sm-client.pid after start. The service might be broken.
Jan  9 06:58:29 speedy systemd[1]: Startup finished in 1s 83ms 150us (kernel) + 11s 537ms 257us (initrd) + 41s 64ms 222us (userspace) = 53s 684ms 629us.
Jan  9 06:58:30 speedy dbus-daemon[1001]: ** Message: No devices in use, exit
Jan  9 06:58:35 speedy dhclient[1080]: DHCPDISCOVER on p5p1 to 255.255.255.255 port 67 interval 16
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPDISCOVER on p5p1 to 255.255.255.255 port 67 interval 16
Jan  9 06:58:35 speedy dhclient[1080]: DHCPREQUEST on p5p1 to 255.255.255.255 port 67
Jan  9 06:58:35 speedy dhclient[1080]: DHCPOFFER from 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPREQUEST on p5p1 to 255.255.255.255 port 67
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPOFFER from 192.168.1.251
Jan  9 06:58:35 speedy dhclient[1080]: DHCPACK from 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPACK from 192.168.1.251
Jan  9 06:58:35 speedy dhclient[1080]: bound to 192.168.1.4 -- renewal in 5109 seconds.
Jan  9 06:58:35 speedy NetworkManager[965]: bound to 192.168.1.4 -- renewal in 5109 seconds.
Jan  9 06:58:35 speedy NetworkManager[965]: <info> (p5p1): DHCPv4 state changed preinit -> bound
Jan  9 06:58:35 speedy NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) scheduled...
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info> (p5p1): DHCPv4 state changed preinit -> bound
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) scheduled...
Jan  9 06:58:35 speedy NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) started...
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) started...
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   address 192.168.1.4
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   prefix 24 (255.255.255.0)
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   address 192.168.1.4
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   gateway 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   prefix 24 (255.255.255.0)
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   hostname 'speedy'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   gateway 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   nameserver '192.168.1.251'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   hostname 'speedy'
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   domain name 'gv.shawcable.net'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   nameserver '192.168.1.251'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   domain name 'gv.shawcable.net'

Comment 14 Bill Nottingham 2012-01-09 21:45:10 UTC
(In reply to comment #13)
> It looks NetworkManager-wait-online.service only waits for 30 seconds, but the
> DHCP client takes 36 seconds to acquire an address. NetworkManager seems to
> think that it has a 45 second timeout window, but if
> NetworkManager-wait-online.service only waits for 30 seconds, this seems
> inappropriate.
> 
> Isn't the purpose of NetworkManager-wait-online.service to wait until the
> network is actually up rather than just a timeout.

I suppose it could be changed to have a user-controllable timeout. Waiting indefinitely seems like a bad idea if the cable was unplugged during boot. 30 seconds is somewhat close to what the old network initscript would wait for.