Bug 983915

Summary: rpc.statd (nfs-lock.service) fails to properly start causing NFS mounts at boot to fail
Product: [Fedora] Fedora Reporter: Jason <teradon>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: arancox, aschorr, bcostescu, bfields, chupaka, Colin.Simpson, iand, jeff, johannbg, lnykryn, mattia.verga, mschmidt, msekleta, plautrba, steeve.mccauley, steved, systemd-maint, vpavlin, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-17 14:01:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
remote-fs.target should require that remote-fs-pre.target start before it
none
nfs-lock.service should pull in remote-fs-pre.target
none
nfs-lock.service should pull in remote-fs-pre.target none

Description Jason 2013-07-12 09:27:33 UTC
Description of problem:

This was working under Fedora 18 (F18) - nfs-lock.service apparently used "WantedBy=multi-user.target" while Fedora 19 (F19) now uses "WantedBy=nfs.target" instead. So if nfs.target is disabled, such as a client-only, this never gets started as it did by default in F18. In addition, NFS mounts under F18 waited for the service to online before they tried to mount. Under F19, NFS mounts try to mount before the service even starts rpc.statd according to /var/log/messages.

If I only have a few NFS mounts, they usually all start a rpc.statd instance on their own and work. However, once having a large amount of NFS mounts this quickly breaks down and mount failures occur every time. By large amount of NFS mounts, 20+ NFS mounts using "bg" mount option in fstab.

A workaround to the NFS mounts breaking is that doing a "fg" mount will cause them to all reliably mount as it seems to be a timing issue based on rpc.statd being detected as started but not yet ready. However, this still creates a lot of rpc.statd entries.

Every NFS mount in fstab generates an attempt to launch rpc.statd in parallel. I end up with anywhere from 1 to 23+ rpc.statd processes running. Enabling nfs.target so nfs-lock.service will run causes nfs-lock.service to fail due to rpc.statd already running.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.8-2.0.fc19.x86_64
kernel-3.9.9-301.fc19.x86_64
systemd-204-9.fc19.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install Fedora 19 or upgrade Fedora 18->Fedora 19 (+systemctl enable NetworkManager-dispatcher.service)
2. Add several NFS(v3) mounts to /etc/fstab with "bg" (I currently have 39)
3. Reboot -> randomly will see NFS mounts fail and random # of rpc.statd processes. Due to this, often times /var/run/rpc.statd.pid will be empty

Actual results:

NFS mounts with "bg" specified can randomly fail at boot (typically takes a large number of mounts before this happens)
Random number of rpc.statd processes running and /var/run/rpc.statd.pid is usually blank but may sometimes list one of the pids

Expected results:

Only one rpc.statd should be started and pid in /var/run/rpc.statd.pid

NFS mounts should not fail at boot due to rpc.statd not being ready due to a timing issue (rpc.statd is being started, but NFS mounts are being done before rpc.statd is fully up and failing)

Additional info:

nfs-lock.service should probably run regardless of nfs.target being enabled as it was in Fedora 18 and NFS mounts should likely wait to start for that to finish and rpc.statd to be online.

Bug 980580 is for changing which rpc.statd is used and the fact that nfslock.service is symlinked and always has been to nfs-lock.service but that user also mentioned they have a failed nfs-lock.service, which may be the same issue I am having. NFS mounts in /etc/fstab start rpc.statd before nfs-lock.service can and then nfs-lock.service fails to start.



--------

!! Trying with nfs.target enabled
# systemctl status nfs-lock.service
nfs-lock.service - NFS file locking service.
   Loaded: loaded (/usr/lib/systemd/system/nfs-lock.service; enabled)
   Active: failed (Result: exit-code) since Fri 2013-07-12 02:43:06 EDT; 2min 39s ago
  Process: 1372 ExecStart=/sbin/rpc.statd $STATDARG (code=exited, status=1/FAILURE)
  Process: 1369 ExecStartPre=/usr/lib/nfs-utils/scripts/nfs-lock.preconfig (code=exited, status=0/SUCCESS)

!! /var/log/messages
...
Jul 12 02:43:04 Dain systemd[1]: Started LSB: Bring up/down networking.
Jul 12 02:43:04 Dain systemd[1]: Started Load static arp entries.
Jul 12 02:43:04 Dain systemd[1]: Starting Network.
Jul 12 02:43:04 Dain systemd[1]: Reached target Network.
Jul 12 02:43:04 Dain systemd[1]: Mounting...
...
Jul 12 02:43:05 Dain systemd[1]: Starting RPC bind service...
Jul 12 02:43:05 Dain systemd[1]: Starting Sendmail Mail Transport Agent...
Jul 12 02:43:05 Dain systemd[1]: Starting OpenSSH server daemon...
Jul 12 02:43:05 Dain systemd[1]: Started OpenSSH server daemon.
Jul 12 02:43:05 Dain systemd[1]: Started RPC bind service.
Jul 12 02:43:05 Dain systemd[1]: Starting NFS file locking service....
Jul 12 02:43:06 Dain rpc.statd[1374]: Version 1.2.7 starting
Jul 12 02:43:06 Dain rpc.statd[1377]: Version 1.2.7 starting
Jul 12 02:43:06 Dain rpc.statd[1376]: Version 1.2.7 starting
Jul 12 02:43:06 Dain rpc.statd[1374]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1377]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1378]: Version 1.2.7 starting
Jul 12 02:43:06 Dain rpc.statd[1378]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1379]: Version 1.2.7 starting
<cut repeating starting/Flags lines as a total of 40 of these happened>
Jul 12 02:43:06 Dain rpc.statd[1408]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1386]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1375]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1374]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1409]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1394]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1403]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1391]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1380]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1410]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1392]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1381]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1400]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain sm-notify[1412]: Version 1.2.7 starting
Jul 12 02:43:06 Dain rpc.statd[1402]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1411]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1395]: Flags: TI-RPC
Jul 12 02:43:06 Dain rpc.statd[1396]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1399]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain rpc.statd[1406]: failed to create RPC listeners, exiting
Jul 12 02:43:06 Dain mount[1302]: mount.nfs: rpc.statd is not running but is required for remote locking.
Jul 12 02:43:06 Dain mount[1302]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 12 02:43:06 Dain mount[1302]: mount.nfs: an incorrect mount option was specified
Jul 12 02:43:06 Dain mount[1262]: mount.nfs: rpc.statd is not running but is required for remote locking.
Jul 12 02:43:06 Dain mount[1262]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 12 02:43:06 Dain mount[1262]: mount.nfs: an incorrect mount option was specified
Jul 12 02:43:06 Dain mount[1257]: mount.nfs: rpc.statd is not running but is required for remote locking.
Jul 12 02:43:06 Dain mount[1257]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 12 02:43:06 Dain mount[1257]: mount.nfs: an incorrect mount option was specified
<cut repeating NFS mount failures due to rpc.statd>
Jul 12 02:43:06 Dain systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Jul 12 02:43:06 Dain systemd[1]: Starting Permit User Sessions...
Jul 12 02:43:06 Dain kernel: [   29.218225] FS-Cache: Loaded
Jul 12 02:43:06 Dain kernel: [   29.276117] Key type dns_resolver registered
Jul 12 02:43:06 Dain kernel: [   29.296606] FS-Cache: Netfs 'nfs' registered for caching
Jul 12 02:43:12 Dain rpc.statd[1395]: failed to create RPC listeners, exiting
Jul 12 02:43:12 Dain rpc.statd[1407]: failed to create RPC listeners, exiting
Jul 12 02:43:12 Dain rpc.statd[1392]: failed to create RPC listeners, exiting
Jul 12 02:43:12 Dain systemd[1]: nfs-lock.service: control process exited, code=exited status=1
Jul 12 02:43:12 Dain systemd[1]: Failed to start NFS file locking service..
Jul 12 02:43:12 Dain systemd[1]: Unit nfs-lock.service entered failed state.
Jul 12 02:43:12 Dain systemd[1]: Starting Network File System Server.
Jul 12 02:43:12 Dain systemd[1]: Reached target Network File System Server.
...
(Note: Only about half the NFS mounts failed)

# ps -ef | grep statd
rpcuser   1376     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1377     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1378     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1379     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1382     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1383     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1384     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1385     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1387     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1388     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1389     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1390     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1393     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1397     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1398     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1401     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1403     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1404     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1405     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1408     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1409     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1410     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
rpcuser   1411     1  0 02:43 ?        00:00:00 rpc.statd --no-notify
root      1617  1585  0 02:45 pts/0    00:00:00 grep --color=auto statd

# cat /var/run/rpc.statd.pid
[root@Dain ~]#

!! NFS Mount options in /etc/fstab
nfs     tcp,vers=3,hard,bg,ro   0 0

!! Switching from bg to fg causes all mounts to work, but multiple rpc.statd and nfs-lock.service failing issue still exists
nfs     tcp,vers=3,hard,fg,ro   0 0

!! I have this same issue on all my installs, but I only noticed it on this one as this is my machine which runs backups for my site and has a large amount of NFS mounts and causes the failures. My other machines have many less NFS mounts and has not been noticed as a result

!! If I mark the mounts noauto to prevent mounting at boot, disable nfs.target to prevent nfs-lock.service from starting and then remove the noauto to try a mount -a, it works to mount everything at once with "bg", assuming because it isn't forking and trying each mount at the same time where systemd is

# ps -ef | grep statd
root      1329  1334  0 05:08 pts/0    00:00:00 grep --color=auto statd

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        39G   11G   27G  29% /
devtmpfs        2.0G     0  2.0G   0% /dev
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           2.0G  832K  2.0G   1% /run
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs           2.0G     0  2.0G   0% /tmp
/dev/sda1       976M  243M  683M  27% /boot

# mount -a
# ps -ef | grep statd
rpcuser   1450     1  0 05:12 ?        00:00:00 rpc.statd --no-notify
root      1500  1334  0 05:12 pts/0    00:00:00 grep --color=auto statd

# df -h|grep 10\\.2\\.|wc -l
39



Jul 12 05:12:34 Dain rpc.statd[1450]: Version 1.2.7 starting
Jul 12 05:12:34 Dain rpc.statd[1450]: Flags: TI-RPC
Jul 12 05:12:34 Dain kernel: [  360.515848] FS-Cache: Loaded
Jul 12 05:12:34 Dain kernel: [  360.526567] Key type dns_resolver registered
Jul 12 05:12:34 Dain kernel: [  360.555184] RPC: Registered named UNIX socket transport module.
Jul 12 05:12:34 Dain kernel: [  360.555403] RPC: Registered udp transport module.
Jul 12 05:12:34 Dain kernel: [  360.555478] RPC: Registered tcp transport module.
Jul 12 05:12:34 Dain kernel: [  360.555530] RPC: Registered tcp NFSv4.1 backchannel transport module.
Jul 12 05:12:34 Dain kernel: [  360.666596] FS-Cache: Netfs 'nfs' registered for caching

Comment 1 Jason 2013-07-12 21:50:57 UTC
Note: In testing I ended up using a Fedora 18 install, Fedora 18 -> Fedora 19 using fedup then fully upgraded (distro-sync), and a clean Fedora 19 install and full "yum update" on it. The Fedora 18 install was for comparison since the NFS mounts worked reliably here, although the rpc.statd and nfs-lock.service issue (starting many instances) appears to have existed here as well, but the mounts waited long enough to make sure rpc.statd was always running/responsive.

To test, again, I ended up making a NFS mount allowing sub-directory mounts and made directories 1-40 and mounted both the top level and the directories 1-40 (41 NFS mounts). Mount options in fstab:
nfs     tcp,vers=3,hard,bg   0 0

On a clean install after updating and reboot then adding the mounts, "mount -a -t nfs" works as expected, exactly one rpc.statd started and all mounts are there.

On reboot, systemd tries to run in parallel a mount command for each mount and tries to start 41 rpc.statd's, which causes some to fail, some to start late enough to see one started, and anywhere from 4-24 or so to actually start and show up in "ps -ef | grep stat"

Trying "systemctl enable nfs-lock.service" and "systemctl enable nfs.target" then reboot, you can see from the logs the system comes up hits network, then starts the mount process and starts 41 rpc.statd's, then does NFS stuff and starts NFS File Locking rpc.statd, then it waits for nfs-lock.service before finishing the actual mount command (although rpc.statd was already started). This results in nfs-lock.service sometimes working and going active (it becomes one of the processes that makes it in memory) and sometimes failing (rpc.statd already running or failed to create RPC listeners). Mounts still randomly fail, some work some don't. Each time you reboot, nfs-lock.service may or may not work and which mounts work is different and almost always multiple fail.

Tried "systemctl enable NetworkManager-dispatcher.service" and "systemctl enable NetworkManager-wait-online.service" just in case (wait-online seems to already have been being used even though it was disabled) but there was no change.

Changing mount option from "bg" to "fg" for this test setup did not change the results. For some reason in my Fedora 18 -> Fedora 19 fedup scenario this made the mounts reliably work although the rpc.statd and nfs-lock.service issue still remained. In this setup using a fresh Fedora 19 install fully upgraded did not seem to change the results.

Comment 2 Fedora Update System 2013-07-23 19:36:29 UTC
nfs-utils-1.2.8-3.0.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/nfs-utils-1.2.8-3.0.fc19

Comment 3 Fedora Update System 2013-07-25 00:50:37 UTC
Package nfs-utils-1.2.8-3.0.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing nfs-utils-1.2.8-3.0.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-13591/nfs-utils-1.2.8-3.0.fc19
then log in and leave karma (feedback).

Comment 4 Jason 2013-07-26 05:37:50 UTC
Well, went back to a fresh Fedora 19 install, fully upgraded, made 41 NFS mounts and then tried the ("systemctl enable NetworkManager-dispatcher.service" was done on this):

yum update --enablerepo=updates-testing nfs-utils-1.2.8-3.0.fc19

And rebooted - although I didn't expect this to work (bug is actually kind of two-parts if approached in this manner, there is still the fact that the units/spec file seems to be misbehaving as noted above and I'll note again)

-- I expected to see nfs.target and nfs-lock.service enabled...? Seems they were still disabled for me and this is fresh install then fresh update and running the yum command given to upgrade to nfs-utils-1.2.8-3.0 --

====
[root@localhost ~]# rpm -q nfs-utils
nfs-utils-1.2.8-3.0.fc19.x86_64

[root@localhost ~]# systemctl status nfs.target
nfs.target - Network File System Server
   Loaded: loaded (/usr/lib/systemd/system/nfs.target; disabled)
   Active: inactive (dead)

[root@localhost ~]# systemctl status nfs-lock.service
nfs-lock.service - NFS file locking service.
   Loaded: loaded (/usr/lib/systemd/system/nfs-lock.service; disabled)
   Active: inactive (dead)
====


Well, let's enable both and reboot again anyway. This should bring us to the state we expected to be in - based on my understanding.

====
[root@localhost ~]# systemctl enable nfs-lock.service
ln -s '/usr/lib/systemd/system/nfs-lock.service' '/etc/systemd/system/nfs.target.wants/nfs-lock.service'
[root@localhost ~]# systemctl enable nfs.target
ln -s '/usr/lib/systemd/system/nfs.target' '/etc/systemd/system/multi-user.target.wants/nfs.target'

[root@localhost ~]# systemctl status nfs.target
nfs.target - Network File System Server
   Loaded: loaded (/usr/lib/systemd/system/nfs.target; enabled)
   Active: inactive (dead)

[root@localhost ~]# systemctl status nfs-lock.service
nfs-lock.service - NFS file locking service.
   Loaded: loaded (/usr/lib/systemd/system/nfs-lock.service; enabled)
   Active: inactive (dead)
====

After reboot:

====
[root@localhost ~]# ps -ef | grep statd
rpcuser    877     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    879     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    880     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    881     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    883     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    884     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    885     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    889     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    891     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    907     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    908     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    909     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    910     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    911     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    913     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    915     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    919     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    921     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    926     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
rpcuser    927     1  0 00:58 ?        00:00:00 rpc.statd --no-notify
root      1343  1275  0 01:03 pts/0    00:00:00 grep --color=auto statd

[root@localhost ~]# systemctl status nfs.target
nfs.target - Network File System Server
   Loaded: loaded (/usr/lib/systemd/system/nfs.target; enabled)
   Active: active since Fri 2013-07-26 00:58:40 EDT; 3min 26s ago

Jul 26 00:58:40 localhost.localdomain systemd[1]: Starting Network File System Server.
Jul 26 00:58:40 localhost.localdomain systemd[1]: Reached target Network File System Server.

[root@localhost ~]# systemctl status nfs-lock.service
nfs-lock.service - NFS file locking service.
   Loaded: loaded (/usr/lib/systemd/system/nfs-lock.service; enabled)
   Active: failed (Result: exit-code) since Fri 2013-07-26 00:58:40 EDT; 3min 38s ago
  Process: 900 ExecStart=/sbin/rpc.statd $STATDARG (code=exited, status=1/FAILURE)
  Process: 897 ExecStartPre=/usr/lib/nfs-utils/scripts/nfs-lock.preconfig (code=exited, status=0/SUCCESS)

Jul 26 00:58:33 localhost.localdomain rpc.statd[928]: Version 1.2.8 starting
Jul 26 00:58:33 localhost.localdomain sm-notify[929]: Version 1.2.8 starting
Jul 26 00:58:40 localhost.localdomain systemd[1]: nfs-lock.service: control process exited, code=exited status=1
Jul 26 00:58:40 localhost.localdomain systemd[1]: Failed to start NFS file locking service..
Jul 26 00:58:40 localhost.localdomain systemd[1]: Unit nfs-lock.service entered failed state.
====


And I'm back to the original issue, rpc.statd/nfs-lock.service fails when it's enabled AND you have any mount in fstab set to mount at boot. Note: This is somewhat random. It sometimes does not fail. It seems to be a timing issue which wins the race in cores and time slices. The root cause of the bug itself causes several issues, randomly causes nfs-lock.service unit to fail (although rpc.statd is actually running just not under systemd control), can cause nfs mounts set to mount at boot to randomly fail (more likely the more mounts defined - this is an actual failure/problem), multiple rpc.statd daemons to be running and possibly no pid showing in it's pid file despite actually running.

Looking at the unit files, it looks like nfs-lock.service should be defined to always start before remote-fs. Yet looking at the log files, remote-fs always begins first and starts the rpc.statd's THEN nfs-lock.service starts. This seems to be what gets us in trouble and why I opened this bug and causes my mounts to fail. remote-fs (due to NFS mounts) really seems like they need to wait for nfs-lock.service and not start any rpc.statd of their own.

I could be interpreting something wrong, but the unit files/spec file doesn't seem to be behaving properly here. In addition, I did note that by default nfs-lock.service was disabled by default now and this is also a problem. If you have any nfs mounts, nfs-lock.service and thus nfs.target needs to be mounted, or you will get the same problem. systemd will in parallel try to mount every single NFS mount and as soon as you have more than one, that means you start duplicate rpc.statd instances. And you should only have one getting started. The changes in nfs-utils-1.2.8-3.0.fc19 from Bug 970595 to enable them by default should have taken care of that portion (as noted originally, in Fedora 18 nfs-lock.service was enabled by default and under target multi-user, we got into trouble in Fedora 19 by moving it under nfs.target which was disabled by default - although nfs-lock.service is also disabled by default, which is fine until you start adding NFS client mounts.)

Every boot means a random chance of NFS mount failures and nfs-lock.service failure due to NFS client mounts [each and every mount] starting up [its own instance of an] rpc.statd before nfs-lock.service. The nfs-lock.service fails if its rpc.statd fails to start due to conflicting with the other rpc.statds from NFS client mounts. Mounts fail if they start too soon that there isn't a running rpc.statd pid yet but too late and their rpc.statd fails due to a conflicting rpc.statd from other NFS client mounts. Mounts before and after that window will work.


==== /var/log/messages
Jul 26 00:57:50 localhost kernel: [    0.000000] Initializing cgroup subsys cpuset
Jul 26 00:57:50 localhost kernel: [    0.000000] Initializing cgroup subsys cpu
Jul 26 00:57:50 localhost systemd[1]: systemd 204 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
Jul 26 00:57:50 localhost kernel: [    0.000000] Linux version 3.9.9-302.fc19.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.8.1 20130603 (Red Hat 4.8.1-1) (GCC) ) #1 SMP Sat Jul 6 13:41:07 UTC 2013
Jul 26 00:57:50 localhost kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.9.9-302.fc19.x86_64 root=UUID=ab1eb936-0ece-49a1-a0d0-3d1c2c4459d9 ro rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us rd.luks=0 vconsole.font=latarcyrheb-sun16 LANG=en_US.UTF-8 divider=10
... skipping ...
Jul 26 00:58:31 localhost systemd[1]: Started Network Manager Wait Online.
Jul 26 00:58:31 localhost systemd[1]: Starting Network is Online.
Jul 26 00:58:31 localhost systemd[1]: Reached target Network is Online.
Jul 26 00:58:31 localhost systemd[1]: Starting Network.
Jul 26 00:58:31 localhost systemd[1]: Reached target Network.
Jul 26 00:58:31 localhost systemd[1]: Starting Sendmail Mail Transport Agent...
Jul 26 00:58:31 localhost systemd[1]: Starting OpenSSH server daemon...
Jul 26 00:58:31 localhost NetworkManager[348]: <info> Activation (p2p1) successful, device activated.
Jul 26 00:58:31 localhost NetworkManager[348]: <info> Activation (p2p1) Stage 4 of 5 (IPv6 Configure Timeout) started...
Jul 26 00:58:31 localhost NetworkManager[348]: <info> Activation (p2p1) Stage 4 of 5 (IPv6 Configure Timeout) complete.
Jul 26 00:58:31 localhost dbus-daemon[303]: dbus[303]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatche
r.service'
Jul 26 00:58:31 localhost dbus[303]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Jul 26 00:58:31 localhost systemd[1]: Started Login and scanning of iSCSI devices.
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/40...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/39...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/38...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/37...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/36...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/35...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/34...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/33...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/32...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/31...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/30...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/29...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/28...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/27...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/26...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/25...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/24...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/23...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/22...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/21...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/20...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/19...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/18...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/17...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/16...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/15...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/14...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/13...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/12...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/11...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/10...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/9...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/8...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/7...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/6...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/5...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/4...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/3...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/2...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/1...
Jul 26 00:58:32 localhost systemd[1]: Starting RPC bind service...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/tmp...
Jul 26 00:58:32 localhost systemd[1]: Started OpenSSH server daemon.
Jul 26 00:58:32 localhost systemd[1]: Starting Network Manager Script Dispatcher Service...
Jul 26 00:58:32 localhost rpc.statd[877]: Version 1.2.8 starting
Jul 26 00:58:32 localhost dbus-daemon[303]: dbus[303]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 26 00:58:32 localhost dbus[303]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jul 26 00:58:32 localhost systemd[1]: Started Network Manager Script Dispatcher Service.
Jul 26 00:58:32 localhost rpc.statd[883]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[879]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[884]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[881]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[877]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[881]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[879]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[885]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[885]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[880]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[880]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[891]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[891]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[889]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[889]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[883]: Flags: TI-RPC
Jul 26 00:58:32 localhost rpc.statd[884]: Flags: TI-RPC
Jul 26 00:58:32 localhost systemd[1]: Started RPC bind service.
Jul 26 00:58:32 localhost systemd[1]: Starting NFS file locking service....
Jul 26 00:58:33 localhost rpc.statd[903]: Version 1.2.8 starting
Jul 26 00:58:33 localhost rpc.statd[903]: Flags: TI-RPC
Jul 26 00:58:33 localhost rpc.statd[906]: Version 1.2.8 starting
Jul 26 00:58:33 localhost rpc.statd[906]: Flags: TI-RPC
... skipping ...
Jul 26 00:58:33 localhost sm-notify[929]: Version 1.2.8 starting
Jul 26 00:58:33 localhost systemd[1]: Stopping Sendmail Mail Transport Agent...
Jul 26 00:58:33 localhost systemd[1]: Starting Sendmail Mail Transport Agent...
Jul 26 00:58:33 localhost mount[813]: Statd service already running!
Jul 26 00:58:33 localhost mount[747]: Statd service already running!
Jul 26 00:58:33 localhost mount[771]: Statd service already running!
Jul 26 00:58:33 localhost mount[824]: Statd service already running!
Jul 26 00:58:33 localhost mount[810]: Statd service already running!
Jul 26 00:58:33 localhost systemd[1]: PID file /run/sendmail.pid not readable (yet?) after start.
Jul 26 00:58:33 localhost systemd[1]: Started Sendmail Mail Transport Agent.
Jul 26 00:58:33 localhost systemd[1]: Starting Sendmail Mail Transport Client...
Jul 26 00:58:33 localhost mount[792]: Statd service already running!
Jul 26 00:58:33 localhost mount[786]: mount.nfs: rpc.statd is not running but is required for remote locking.
Jul 26 00:58:33 localhost mount[786]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 26 00:58:33 localhost mount[786]: mount.nfs: an incorrect mount option was specified
Jul 26 00:58:33 localhost systemd[1]: loki-21.mount mount process exited, code=exited status=32
Jul 26 00:58:33 localhost systemd[1]: Failed to mount /loki/21.
Jul 26 00:58:33 localhost systemd[1]: Dependency failed for Remote File Systems.
Jul 26 00:58:33 localhost systemd[1]: Unit loki-21.mount entered failed state.
Jul 26 00:58:33 localhost systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Jul 26 00:58:33 localhost systemd[1]: loki-34.mount mount process exited, code=exited status=32
Jul 26 00:58:33 localhost systemd[1]: Failed to mount /loki/34.
Jul 26 00:58:33 localhost systemd[1]: Unit loki-34.mount entered failed state.
Jul 26 00:58:33 localhost systemd[1]: loki-17.mount mount process exited, code=exited status=32
Jul 26 00:58:33 localhost systemd[1]: Failed to mount /loki/17.
Jul 26 00:58:33 localhost systemd[1]: loki-15.mount mount process exited, code=exited status=32
... skipping ...
Jul 26 00:58:33 localhost systemd[1]: loki-29.mount mount process exited, code=exited status=32
Jul 26 00:58:33 localhost mount[798]: mount.nfs: rpc.statd is not running but is required for remote locking.
Jul 26 00:58:33 localhost mount[798]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 26 00:58:33 localhost mount[798]: mount.nfs: an incorrect mount option was specified
Jul 26 00:58:33 localhost mount[802]: mount.nfs: rpc.statd is not running but is required for remote locking.
Jul 26 00:58:33 localhost mount[802]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 26 00:58:33 localhost mount[802]: mount.nfs: an incorrect mount option was specified
Jul 26 00:58:34 localhost kernel: [   63.148622] FS-Cache: Loaded
Jul 26 00:58:34 localhost kernel: [   63.393935] Key type dns_resolver registered
Jul 26 00:58:34 localhost kernel: [   63.680649] FS-Cache: Netfs 'nfs' registered for caching
Jul 26 00:58:40 localhost mount[781]: Statd service already running!
...
Jul 26 00:58:40 localhost mount[769]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 26 00:58:40 localhost mount[769]: mount.nfs: an incorrect mount option was specified
Jul 26 00:58:40 localhost systemd[1]: Failed to mount /loki/29.
Jul 26 00:58:40 localhost systemd[1]: Unit loki-29.mount entered failed state.
Jul 26 00:58:40 localhost systemd[1]: loki-23.mount mount process exited, code=exited status=32
Jul 26 00:58:40 localhost systemd[1]: Failed to mount /loki/23.
Jul 26 00:58:40 localhost systemd[1]: Unit loki-23.mount entered failed state.
Jul 26 00:58:40 localhost systemd[1]: Started Permit User Sessions.
Jul 26 00:58:40 localhost systemd[1]: Starting Command Scheduler...
Jul 26 00:58:40 localhost systemd[1]: Started Command Scheduler.
Jul 26 00:58:40 localhost systemd[1]: Starting Job spooling tools...
Jul 26 00:58:40 localhost systemd[1]: Started Job spooling tools.
Jul 26 00:58:40 localhost systemd[1]: Starting Terminate Plymouth Boot Screen...
Jul 26 00:58:40 localhost mount[779]: mount.nfs: rpc.statd is not running but is required for remote locking.
Jul 26 00:58:40 localhost mount[779]: mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
Jul 26 00:58:40 localhost mount[779]: mount.nfs: an incorrect mount option was specified
Jul 26 00:58:40 localhost rpc.statd[912]: failed to create RPC listeners, exiting
Jul 26 00:58:40 localhost rpc.statd[906]: failed to create RPC listeners, exiting
...
Jul 26 00:58:40 localhost mount[783]: mount.nfs: an incorrect mount option was specified
Jul 26 00:58:40 localhost systemd[1]: Started Sendmail Mail Transport Client.
Jul 26 00:58:40 localhost systemd[1]: nfs-lock.service: control process exited, code=exited status=1
Jul 26 00:58:40 localhost systemd[1]: Failed to start NFS file locking service..
Jul 26 00:58:40 localhost systemd[1]: Unit nfs-lock.service entered failed state.
Jul 26 00:58:40 localhost systemd[1]: Starting Network File System Server.
Jul 26 00:58:40 localhost systemd[1]: Reached target Network File System Server.
Jul 26 00:58:40 localhost systemd[1]: Starting Multi-User System.
Jul 26 00:58:40 localhost systemd[1]: Reached target Multi-User System.
Jul 26 00:58:40 localhost systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jul 26 00:58:40 localhost systemd[1]: Starting Stop Read-Ahead Data Collection 10s After Completed Startup.
Jul 26 00:58:40 localhost systemd[1]: Started Stop Read-Ahead Data Collection 10s After Completed Startup.
Jul 26 00:58:40 localhost systemd[1]: Mounted /loki/5.
...
...
====


Note: Even with nfs.target and nfs-lock.service enabled, this seems to be an issue (start-up order and remote-fs not waiting for nfs-lock.service) based on /var/log/messages:

...
Jul 26 00:58:31 localhost systemd[1]: Started Network Manager Wait Online.
Jul 26 00:58:31 localhost systemd[1]: Starting Network is Online.
Jul 26 00:58:31 localhost systemd[1]: Reached target Network is Online.
Jul 26 00:58:31 localhost systemd[1]: Starting Network.
Jul 26 00:58:31 localhost systemd[1]: Reached target Network.
...
Jul 26 00:58:31 localhost systemd[1]: Mounting /loki/40...
...   <---- Note: Why are we starting mounts here, NFS isn't on yet?
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/1...
Jul 26 00:58:32 localhost systemd[1]: Starting RPC bind service...
Jul 26 00:58:32 localhost systemd[1]: Mounting /loki/tmp...
Jul 26 00:58:32 localhost systemd[1]: Started OpenSSH server daemon.
Jul 26 00:58:32 localhost systemd[1]: Starting Network Manager Script Dispatcher Service...
Jul 26 00:58:32 localhost rpc.statd[877]: Version 1.2.8 starting
...
Jul 26 00:58:32 localhost rpc.statd[881]: Version 1.2.8 starting
Jul 26 00:58:32 localhost rpc.statd[877]: Flags: TI-RPC
...  <--- From the NFS mounts in fstab we see concurrent rpc.statd's
...  <--- being started... again, why - NFS isn't on yet, specifically
...  <--- nfs-lock.service didn't even try to start yet which is what
...  <--- 'should' be bringing rpc.statd online and prevent these from
...  <--- even trying to start
Jul 26 00:58:32 localhost rpc.statd[884]: Flags: TI-RPC
Jul 26 00:58:32 localhost systemd[1]: Started RPC bind service.
Jul 26 00:58:32 localhost systemd[1]: Starting NFS file locking service....
Jul 26 00:58:33 localhost rpc.statd[903]: Version 1.2.8 starting


Note: "Starting NFS file locking service..." shows up way too late. rpc.statd was already tried to be started countless times which causes nfs-lock.service unit to fail and the mount flakiness. If you force nfs-lock.service to come online before remote-fs and remote-fs to wait for it, the issue never happens and only one rpc.statd starts. However, from unit files, this appears to be what is defined. But doesn't look like what is happening?



Hopefully that clarifies what I'm seeing. Need anything specific, let me know. So far it's been incredibly reliable to reproduce.

Comment 5 Fedora Update System 2013-08-02 03:49:52 UTC
nfs-utils-1.2.8-3.0.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 6 Jason 2013-08-02 04:10:50 UTC
(In reply to Fedora Update System from comment #5)
> nfs-utils-1.2.8-3.0.fc19 has been pushed to the Fedora 19 stable repository.
> If problems still persist, please make note of it in this bug report.

Please review my comment #4 where I already reviewed the package "nfs-utils-1.2.8-3.0.fc19" which does not address this bug.

In also taking a look at the spec file diff, looks like it only enables nfs.target and nfs-lock.service on install and does a reenable on upgrade (but only if already enabled, which is appropriate). So for a Fresh Fedora 19 where these are disabled by default, they would still be disabled after the yum update, which at least explains why there was no change.

This still leaves the root of this bug, rpc.statd / nfs-lock.service is not starting up properly and nfs-lock.service can be seen failing as in the Description and comment #4. This can cause NFS mounts at boot to fail as well as other symptoms like multiple rpc.statd's to show up in the process list.

Comment 7 Jason 2013-08-23 05:59:19 UTC
I noticed on my Fedora 18 which I had restored from a backup due to this issue that I suddenly started running into this same issue on a recent yum update where I wasn't having issues before. Based on the /var/log/messages it is having the same behavior, mounts are starting before the NFS File Locking service, so the mounts are starting their own rpc.statd before NFS File Locking can start the rpc.statd. Oddly I still seem to have less chance of issues under F18 where I still have certain failure under F19. So the mounts do sometimes fail, but not always. If I reboot a few times, it eventually mounts everything fine on it's own, although I still have twenty rpc.statd's in my process list and a bunch of errors about rpc.statd already running in my logs - NFS File Locking started after the mount attempts.

Same fix seems to apply, if you manually tweak things to force a single rpc.statd to start early in the boot cycle everything works properly (such as getting nfs-lock.service to start early and making sure rpc.statd is started before the NFS client mounts start).

As it stands now, nfs-lock.service with a NFS client mount under fstab seems to cause nfs-lock.service to fail (rpc.statd already running) in Fedora 19.

Comment 8 Andrew J. Schorr 2013-09-10 01:42:08 UTC
I have experienced what appears to be the same problem with rpc.statd getting started more than once.  I notice that the nfs-lock.service file says "Before=remote-fs-pre.target", but the "remote-fs-pre.target" is not actually being activated.  I patched the remote-fs.target by adding "Requires=remote-fs-pre.target", and that fixed the problem for me.  It causes nfs-lock.service to start before remote-fs-pre.target, then the nfs mounts occur, and finally remote-fs.target is reached.  So I think the problem was simply that the remote-fs-pre.target was not being activated.

Comment 9 Andrew J. Schorr 2013-09-10 01:44:11 UTC
Created attachment 795785 [details]
remote-fs.target should require that remote-fs-pre.target start  before it

This fixes the problem for me -- it activate the remote-fs-pre.target, so the nfs-lock.service will get started before that and before all NFS mounts.

Comment 10 Jason 2013-09-15 19:59:33 UTC
(In reply to Andrew J. Schorr from comment #9)
> Created attachment 795785 [details]
> remote-fs.target should require that remote-fs-pre.target start  before it
> 
> This fixes the problem for me -- it activate the remote-fs-pre.target, so
> the nfs-lock.service will get started before that and before all NFS mounts.

I can verify, I was experiencing the same issue where remote-fs-pre.target did not seem to be working but I had not noticed it was not getting activated, although after checking it was indeed inactive, although from looking at units I was under the impression this should be getting activated but wasn't working as expected.

I applied the "Requires=remote-fs-pre.target" to the remote-fs.target as suggested and as given by the supplied diff attachment 795785 [details] and rebooted. remote-fs-pre.target activated this time and worked as I expected it should have since I created this Bug. nfs-lock.service started before any mounts and then the mounts after it [rpc.statd] was running causing no issues or mount failures. Tried this in my test setup which always had a 100% failure rate without any issue.

This seems to be the appropriate fix [along with the changes already made in nfs-utils-1.2.8-3.0.fc19 to make nfs-lock.service and nfs.target enabled by default].

Thanks for catching that - works for me.

Comment 11 Bogdan Costescu 2013-09-18 08:53:53 UTC
Had similar problems with F19 which didn't appear in earlier releases up to F17 (didn't use F18). The number of NFS mounts on the clients was smaller (4-5), but almost always one of them would fail at boot. Systems are up-to-date, now using 
nfs-utils-1.2.8-4.0.fc19.x86_64.

The patch supplied in comment 9 seems to fix the problem for me too. Please release it as a fix.

Comment 12 Andrew J. Schorr 2013-09-18 14:19:37 UTC
Yes, I do believe that my patch works.  However, I am not sure that this solves the problem in the "correct" way.  There appears to be some logic in src/core/mount.c that attempts to make this all happen automatically.  It could be that the proper fix is to tweak the logic in src/core/mount.c:mount_add_default_dependencies, but I'm not sufficiently familiar with systemd's internals to provide such a patch.  For example, this old (and subsequently obsoleted) git commit was attempting to address such issues, I think:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=db1355b1c181a4b1ac277064918d7c794dfb6edd

But there have clearly been a lot of changes to the code since then.  It looks as if somebody previously tried a patch similar to mine here:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=6bde0b3220e95a08cefb70846f73b2cf24b7734a

But it was reverted here:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=a69fe051b85865cad5df74f9f58915e36c4b1418

So this seems to be an area of conceptual difficulty.  I hope somebody can find the "right" fix.  Until then, my patch in Comment 9 should work.

Comment 13 Steve Dickson 2013-09-18 19:42:42 UTC
resigning to systemd since that is the owner of 
    /usr/lib/systemd/system/remote-fs.target

Comment 14 Michal Schmidt 2013-09-19 15:36:13 UTC
If I understand correctly, the issue is that nothing pulled in remote-fs-pre.target into the boot transaction, so all the ordering related to that unit did not come into effect.

The patch in comment #9 helps, but it is not the proper way to fix this.

(In reply to Andrew J. Schorr from comment #12)
[...]
> But it was reverted here:
> 
> http://cgit.freedesktop.org/systemd/systemd/commit/
> ?id=a69fe051b85865cad5df74f9f58915e36c4b1418

Right. Let's paste the relevant commit message here:
  We should not pull in remote-fs-pre.target unconditionally. It's supposed to
  be pulled in by the implementors of it, rather then its users.

The "implementor" in this case is nfs-lock.service (it is a service that implements functionality to allow remote mounts to work).
It should be nfs-lock.service who pulls remote-fs-pre.target in (using "Wants=remote-fs-pre.target"), not remote-fs.target.
See also "man systemd.special", where remote-fs-pre.target is described.

Comment 15 Jason 2013-09-19 17:00:35 UTC
Created attachment 800040 [details]
nfs-lock.service should pull in remote-fs-pre.target

Patched nfs-lock.service; if I understood correctly the unit section should Want the remote-fs-pre.target. I can verify that without this patch things break as I originally submitted in the bug (and remote-fs-pre.target does not activate) but after the patch and reboot, things work as expected (and remote-fs-pre.target is activated). Based on my understanding now, along with the previous update in the bug, this seems to be the appropriate fix.

Patch adds to:
/usr/lib/systemd/system/nfs-lock.service

Under '[Unit]' section to include:
Wants=remote-fs-pre.target

Comment 16 Jason 2013-09-19 17:20:32 UTC
Created attachment 800042 [details]
nfs-lock.service should pull in remote-fs-pre.target

Corrected how I generated the diff - didn't include proper path

Comment 17 Bogdan Costescu 2013-09-23 13:28:28 UTC
With the fix suggested in comments #14 and #15, I got again NFS volumes not mounted at boot. So far, the fix in comment #9 seems to actually resolve the issue for me.

Comment 18 Andrew J. Schorr 2013-09-23 14:37:15 UTC
That is strange.  I think both patches should have the equivalent effect.  The patch in comment #16 seems to work for me.

If you run "systemctl list-dependencies nfs-lock.service", do you see remote-fs-pre.target in the output?  And if you run "systemctl --before list-dependencies nfs-lock.service", do you see it there as well?

Also, what do "systemctl status nfs-lock.service" and "systemctl status remote-fs-pre.target" show?

Regards,
Andy

Comment 19 Bogdan Costescu 2013-09-23 16:34:21 UTC
With the change to remote-fs.target (as in comment #9), I don't see remote-fs-pre.target for 'systemctl list-dependencies nfs-lock.service', but I do see it for 'systemctl --before list-dependencies nfs-lock.service'. For both nfs-lock.service and remote-fs-pre.target I see no errors in the status output.

With the change to nfs-lock.service (as in comment #14), I see remote-fs-pre.target listed in both cases.

While doing these tests, including multiple reboots, I have observed my initial problem (some NFS volumes not mounted at boot) in both cases. It's the first time I see it happen after changes to remote-fs.target. So it's likely that I encounter a different issue...

Comment 20 Steve Dickson 2013-12-02 15:36:08 UTC
My apologizes for taking so long to get back to this... but...

With https://bugzilla.redhat.com/show_bug.cgi?id=786050
I added both
    +Wants=remote-fs-pre.target
    +Before=remote-fs-pre.target

(see commit e1db7048)

For bz https://bugzilla.redhat.com/show_bug.cgi?id=817895 I was told
the Wants= was "superfluous" so I removed it
(see commit 996074322)

Now with Comment 14 your saying I should add the Wants= back??? 

I'm a bit confused....

Comment 21 Michal Schmidt 2014-06-26 14:52:25 UTC
Well, the semantics of the targets may have changed a bit. I haven't been paying close attention to upstream development in this regard lately, but I know some new network related targets have been introduced.
Michal, could you check if the current NFS units have sane dependencies?

Comment 22 Mattia Verga 2014-11-06 18:26:33 UTC
Any news on this?
I've run in the same problem on F21 and opened a bug before seeing this. I think there are several other possible duplicates, see #1007982 #1046015 #1055201

Comment 23 Mattia Verga 2014-11-06 18:26:52 UTC
*** Bug 1158164 has been marked as a duplicate of this bug. ***

Comment 24 Fedora End Of Life 2015-01-09 22:11:49 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 25 Michal Sekletar 2015-01-15 15:14:02 UTC
Since this was reported against F19 which is now EOL I didn't investigate further. However I did a quick test on F21 and mounting NFS shares on boot seems to work fine.

Comment 26 Bogdan Costescu 2015-01-16 05:35:41 UTC
Later updates to F19 as well as F20 have both been working fine for some time. I believe the problem was solved along the way, although I cannot point out exactly when.

Comment 27 Steve Dickson 2015-01-17 14:01:36 UTC
(In reply to Bogdan Costescu from comment #26)
> Later updates to F19 as well as F20 have both been working fine for some
> time. I believe the problem was solved along the way, although I cannot
> point out exactly when.

Due to this comment, I'm going to close this out.