749940 – NFS mount fails

Bug 749940 - NFS mount fails

Summary: NFS mount fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	16
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-10-28 23:19 UTC by Andrew McNabb
Modified:	2012-01-09 21:45 UTC (History)
CC List:	8 users (show)
Fixed In Version:	systemd-37-3.fc16
Clone Of:
Clones:	750032 (view as bug list)
Environment:
Last Closed:	2011-11-10 17:44:37 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
boot.log (7.35 KB, text/plain) 2011-10-30 06:37 UTC, Andrew McNabb	no flags	Details
View All

Description Andrew McNabb 2011-10-28 23:19:00 UTC

Description of problem:

I have an NFS mount in /etc/fstab called /aml. On my Fedora 16 Beta machine, the systemd unit for this mount fails at startup, and I can't find any indication of what's wrong. In /var/log/boot.log, I see the following:

Starting /aml ESC[1;31mfailedESC[0m, see 'systemctl status aml.mount' for details.

If I run "systemctl status aml.mount", it gives:

aml.mount - /aml
	  Loaded: loaded
	  Active: failed since Fri, 28 Oct 2011 16:42:45 -0600; 11min ago
	   Where: /aml
	    What: nfs:/
	 Process: 372 ExecMount=/bin/mount /aml (code=exited, status=32)
	  CGroup: name=systemd:/system/aml.mount

Unfortunately, I can't find any logs that show what the specific error message is. Additionally, if I mount the filesystem manually with "mount /aml", it succeeds.


Version-Release number of selected component (if applicable):

systemd-37-2.fc16.x86_64


How reproducible:
every boot


Steps to Reproduce:
1. Add an NFS mount to /etc/fstab.
2. Boot.

  
Actual results:

The mount unit fails.


Expected results:

The filesystem should be mounted.


Additional info:

Because of this problem, NFS home directories are unavailable without manual intervention. If it helps, the specific line in fstab is:

nfs:/ /aml nfs4 noatime,intr,actimeo=0 0 0

Comment 1 Andrew McNabb 2011-10-30 06:37:00 UTC

Created attachment 530817 [details]
boot.log

From boot.log, it's clear that systemd is trying to mount the NFS filesystem (/aml) just after udev starts. Since this is an NFS filesystem, there is no way that it can possibly succeed in mounting until after the "Bring up/down networking" service completes. The log also shows that it tries to start "Mount and unmount network filesystems" before starting "Bring up/down networking". It looks like there are some weird problems with the chain of dependencies.

There are some similarities with bug #692008. However, two differences are: 1) my machine does not have any problems starting the network; and 2) I have SELinux disabled. Thus I do not think this is a duplicate.

One possibly important thing to note is that I'm using the network service instead of NetworkManager.

Is there any other information that would be helpful? Thanks.

Comment 2 Michal Schmidt 2011-10-31 10:02:33 UTC

Probably it's the problem described here:
http://lists.freedesktop.org/archives/systemd-devel/2011-October/003661.html

Does "systemctl enable remote-fs-pre.target" help?

Comment 3 Andrew McNabb 2011-10-31 15:56:03 UTC

Running "systemctl enable remote-fs-pre.target" seems to help. Shouldn't this be enabled by default?

Comment 4 Andrew McNabb 2011-10-31 16:33:45 UTC

Also, I think it would be helpful to add this to the Fedora 16 Common Bugs. If the /etc/fstab has an entry for an NFS filesystem, this filesystem will not be mounted at boot. Home directories on NFS are a common use case, so many systems will not have home directories available at boot, and so users will not be able to log in. The command from comment #2 is a simple workaround.

Comment 5 Michal Schmidt 2011-10-31 16:52:59 UTC

(In reply to comment #3)
> Running "systemctl enable remote-fs-pre.target" seems to help. Shouldn't this
> be enabled by default?

Lennart said he needs to rethink it.

(In reply to comment #4)
> Also, I think it would be helpful to add this to the Fedora 16 Common Bugs

No. This is a regression in systemd-37 which is still only in updates-testing. We should fix it before pushing it to stable.

Comment 6 Lennart Poettering 2011-11-01 21:30:46 UTC

Fixed upstream.

Comment 7 Andrew McNabb 2011-11-01 22:04:20 UTC

(In reply to comment #6)
> Fixed upstream.

Is there an updated RPM that you would like me to test? The latest one I see in koji is systemd-37-2.

Comment 8 Fedora Update System 2011-11-01 23:52:08 UTC

systemd-37-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/FEDORA-2011-14239

Comment 9 Fedora Update System 2011-11-02 17:54:46 UTC

Package systemd-37-3.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-3.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-14239
then log in and leave karma (feedback).

Comment 10 Fedora Update System 2011-11-10 17:44:37 UTC

systemd-37-3.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 11 Evan Rempel 2012-01-07 04:35:09 UTC

I am still having problems with F16 and 

[erempel@speedy ~]$ rpm -qa | grep systemd
systemd-37-3.fc16.x86_64
systemd-sysv-37-3.fc16.x86_64
systemd-units-37-3.fc16.x86_64

Without the NFS filespaces automounting, I can actually log in and issue ifconfig and show that I don't even have an IP address from DHCP negotiation.

I'm not new to linux but I don't understand systemd yet.


$ sudo systemctl enable NetworkManager-wait-online.service


$ sudo systemctl status NetworkManager-wait-online.service
NetworkManager-wait-online.service - Network Manager Wait Online
	  Loaded: loaded (/lib/systemd/system/NetworkManager-wait-online.service; enabled)
	  Active: failed since Fri, 06 Jan 2012 20:26:55 -0800; 5min ago
	Main PID: 1032 (code=exited, status=1/FAILURE)
	  CGroup: name=systemd:/system/NetworkManager-wait-online.service

$ sudo systemctl restart  NetworkManager-wait-online.service
$ sudo systemctl status NetworkManager-wait-online.service
NetworkManager-wait-online.service - Network Manager Wait Online
	  Loaded: loaded (/lib/systemd/system/NetworkManager-wait-online.service; enabled)
	  Active: inactive (dead) since Fri, 06 Jan 2012 20:34:00 -0800; 5s ago
	 Process: 1991 ExecStart=/usr/bin/nm-online -q --timeout=30 (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/NetworkManager-wait-online.service

suggestions

Comment 12 Michal Schmidt 2012-01-09 13:52:32 UTC

(In reply to comment #11)
>    Active: failed since Fri, 06 Jan 2012 20:26:55 -0800; 5min ago
>  Main PID: 1032 (code=exited, status=1/FAILURE)

Any hints in /var/log/messages or dmesg about why it failed?

Comment 13 Evan Rempel 2012-01-09 16:33:43 UTC

It looks NetworkManager-wait-online.service only waits for 30 seconds, but the DHCP client takes 36 seconds to acquire an address. NetworkManager seems to think that it has a 45 second timeout window, but if NetworkManager-wait-online.service only waits for 30 seconds, this seems inappropriate.

Isn't the purpose of NetworkManager-wait-online.service to wait until the network is actually up rather than just a timeout.


Jan  9 06:57:59 speedy NetworkManager[965]: NetworkManager[965]: <info> Activation (p5p1) Beginning DHCPv4 transaction (timeout in 45 seconds)
...
Jan  9 06:58:28 speedy systemd[1]: NetworkManager-wait-online.service: main process exited, code=exited, status=1
Jan  9 06:58:28 speedy systemd[1]: Unit NetworkManager-wait-online.service entered failed state.
Jan  9 06:58:29 speedy rpc.statd[1279]: Version 1.2.5 starting
Jan  9 06:58:29 speedy sm-notify[1280]: Version 1.2.5 starting
Jan  9 06:58:29 speedy systemd[1]: Failed to read PID file /run/sm-client.pid after start. The service might be broken.
Jan  9 06:58:29 speedy systemd[1]: Startup finished in 1s 83ms 150us (kernel) + 11s 537ms 257us (initrd) + 41s 64ms 222us (userspace) = 53s 684ms 629us.
Jan  9 06:58:30 speedy dbus-daemon[1001]: ** Message: No devices in use, exit
Jan  9 06:58:35 speedy dhclient[1080]: DHCPDISCOVER on p5p1 to 255.255.255.255 port 67 interval 16
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPDISCOVER on p5p1 to 255.255.255.255 port 67 interval 16
Jan  9 06:58:35 speedy dhclient[1080]: DHCPREQUEST on p5p1 to 255.255.255.255 port 67
Jan  9 06:58:35 speedy dhclient[1080]: DHCPOFFER from 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPREQUEST on p5p1 to 255.255.255.255 port 67
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPOFFER from 192.168.1.251
Jan  9 06:58:35 speedy dhclient[1080]: DHCPACK from 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: DHCPACK from 192.168.1.251
Jan  9 06:58:35 speedy dhclient[1080]: bound to 192.168.1.4 -- renewal in 5109 seconds.
Jan  9 06:58:35 speedy NetworkManager[965]: bound to 192.168.1.4 -- renewal in 5109 seconds.
Jan  9 06:58:35 speedy NetworkManager[965]: <info> (p5p1): DHCPv4 state changed preinit -> bound
Jan  9 06:58:35 speedy NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) scheduled...
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info> (p5p1): DHCPv4 state changed preinit -> bound
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) scheduled...
Jan  9 06:58:35 speedy NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) started...
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info> Activation (p5p1) Stage 4 of 5 (IP4 Configure Get) started...
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   address 192.168.1.4
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   prefix 24 (255.255.255.0)
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   address 192.168.1.4
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   gateway 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   prefix 24 (255.255.255.0)
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   hostname 'speedy'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   gateway 192.168.1.251
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   nameserver '192.168.1.251'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   hostname 'speedy'
Jan  9 06:58:35 speedy NetworkManager[965]: <info>   domain name 'gv.shawcable.net'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   nameserver '192.168.1.251'
Jan  9 06:58:35 speedy NetworkManager[965]: NetworkManager[965]: <info>   domain name 'gv.shawcable.net'

Comment 14 Bill Nottingham 2012-01-09 21:45:10 UTC

(In reply to comment #13)
> It looks NetworkManager-wait-online.service only waits for 30 seconds, but the
> DHCP client takes 36 seconds to acquire an address. NetworkManager seems to
> think that it has a 45 second timeout window, but if
> NetworkManager-wait-online.service only waits for 30 seconds, this seems
> inappropriate.
> 
> Isn't the purpose of NetworkManager-wait-online.service to wait until the
> network is actually up rather than just a timeout.

I suppose it could be changed to have a user-controllable timeout. Waiting indefinitely seems like a bad idea if the cable was unplugged during boot. 30 seconds is somewhat close to what the old network initscript would wait for.

Note You need to log in before you can comment on or make changes to this bug.