Created attachment 951481 [details] /var/log/messages I have two directories on my NAS mounted in NFS on my Fedora workstation. With F19 I had no problem mounting them in /etc/fstab with: '192.168.1.200:/Mattia/ /mnt/qnap nfs defaults 0 0' After formatting and upgrading the workstation to F21 I have very slow startup with a wait of over 60 seconds just before entering graphical interface caused by these NFS mounts. I tried to modify my fstab like: '192.168.1.200:/Mattia/ /mnt/qnap nfs rw 0 0' or '192.168.1.200:/Mattia/ /mnt/qnap nfs rsize=8192,wsize=8192,timeo=14,soft 0 0' I also tried to enable nfs-client.service, but I always get the delay at startup. If I comment out the lines in fstab relative to NFS shares the delay disappear. After a boot without mounting NFS shares, if I re-enable the lines in fstab and manually mount the shares with: 'mount /mnt/qnap' I can see a long delay before the command is executed. Again, I did not have this problem on F19, so it's not NAS related. I attach the output of /var/log/messages: the delay is clearly visible between 19:12:58 and 19:13:58 when I get the message 'rpc.statd[1240]: failed to create RPC listeners, exiting' By disabling the NFS mounts in fstab I don't have the RPC listeners error.
Seems mount.nfs tries to start rpc-statd.service but fails. During boot I can see "A start job is running for NFS status monitor for NFSv2/3" and causes a 60 seconds timeout on boot. After that rpc.statd.service is in failed state. I then tried to enable ntfs-server.service and this way I can see that rpc-statd.service is started correctly, but I always get the timeout on boot, seems like mount.nfs already tries to start the service even if it's already running...
*** This bug has been marked as a duplicate of bug 983915 ***
I wrongly marked this bug as duplicate of bug 983915, but it's not. With a clean F21 installation with no NFS mountpoints created in /etc/fstab simply try to start rpc-statd.service and you will see it takes 60 seconds to start the service.
Is that true even if you start rpc.statd by hand instead of from the unit file? I wonder if it would be useful in that case to, say, run it under strace, and see if we can figure out what system call it's hanging on. Or it might be worty watching network traffic in wireshark while rpc.statd is starting--maybe it's waiting for a dns query or something.
Created attachment 955897 [details] Strace output of '/usr/sbin/rpc.statd --no-notify $STATDARGS' I'm not expert of strace, however I gave it a try and made a trace (see attached). The problem seems to be at the end of the trace, where the command "read(3, "\0", 1)" takes 60 seconds to complete. I have no idea what it means, I hope you found it useful in some way. If you need a trace with wireshark too let me know and I will figure out how it works...
(In reply to Mattia Verga from comment #5) > Created attachment 955897 [details] > Strace output of '/usr/sbin/rpc.statd --no-notify $STATDARGS' > > I'm not expert of strace, however I gave it a try and made a trace (see > attached). > > The problem seems to be at the end of the trace, where the command "read(3, > "\0", 1)" takes 60 seconds to complete. The end of the trace looks like: pipe([3, 4]) = 0 <0.000017> clone(...) = 1723 <0.000186> close(4) = 0 <0.000015> read(3, "\0", 1) = 1 <60.075102> So I think what happened is it forked a child and just used the read() on its end of the pipe as a way to wait for that child to report that it's ready. So what would really be interesting to know is what happened in the child. I think the simplest would be to just add the "--foreground" option to statd to tell it not to do that.
Created attachment 955925 [details] Strace of forked process The --foreground option has no effect, I always get the delay mounting nfs filesystems (only the first filesystem I mount, following ones have no delay since stats is already been started). Without the foreground option I created the strace of the forked process that shows a 60 seconds timeout in the output.
(In reply to Mattia Verga from comment #7) > The --foreground option has no effect, I always get the delay mounting nfs > filesystems (only the first filesystem I mount OK, sounds expected, assuming by "has no effect" you just mean "doesn't fix the delay. (I was suggesting --foreground just as a way to make statd easier to trace.) > Without the foreground option I created the strace of the forked process > that shows a 60 seconds timeout in the output. Searching for "<6" and then working backwards, I see: 19:16:54 socket(PF_LOCAL, SOCK_STREAM, 0) = 5 <0.000060> ... 19:16:54 connect(5, {sa_family=AF_LOCAL, sun_path="/var/run/rpcbind.sock"}, 23) = 0 <0.000390> 19:16:54 write(5, "\200\0\0@\3\275q\306\0\0\0\0\0\0\0\2\0\1\206\240\0\0\0\3\0\0\0\2\0\0\0\0"..., 68) = 68 <0.000022> 19:16:54 poll([{fd=5, events=POLLIN}], 1, 60000) = 0 (Timeout) <60.060231> So, it looks like it sent a request to rpcbind and didn't get a response. I wonder what's going on with rpcbind? Is it even running at this point?
Good suggestion. Looking rpcbind I found another strange behaviour: if I manually start rpcbind.service and then I start rpc-statd.service I get no lag in starting rpc-statd. But If I start at first rpc-statd.service I can see that rpcbind.service is started as a requirement, but I get the 60 seconds timeout... and I have rpc-statd as PID 1719 and rpcbind as PID 1721. # systemctl status rpc-statd.service ● rpc-statd.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static) Active: active (running) since mar 2014-11-11 18:35:21 CET; 5min ago Process: 1718 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=0/SUCCESS) Main PID: 1719 (rpc.statd) CGroup: /system.slice/rpc-statd.service └─1719 /usr/sbin/rpc.statd --no-notify # systemctl status rpcbind.service ● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; static) Active: active (running) since mar 2014-11-11 18:34:21 CET; 6min ago Process: 1720 ExecStart=/sbin/rpcbind -w ${RPCBIND_ARGS} (code=exited, status=0/SUCCESS) Main PID: 1721 (rpcbind) CGroup: /system.slice/rpcbind.service └─1721 /sbin/rpcbind -w Shouldn't rpcbind have a lower PID because it must be started before statd? Seems to me rpcbind is started after rpc-statd even if in rpc-statd.service file is clearly stated that "requires rpcbind.target"... and that generates the 60 seconds timeout we saw in strace.
I've noticed that in /etc/systemd/system there were no rpcbind.target.wants directory and doing 'systemctl enable/disable rpcbind.service' only enabled/disabled rpcbind.socket. I don't know if I'm doing things right, however I've edited rpcbind.service file adding the line 'WantedBy=rpcbind.target' under [Install] section. After that I did a 'systemd daemon-reload' and a 'systemctl enable rpcbind.service', which created rpcbind.target.wants directory and put a symlink of rpcbind.service. Now mounting nfs filesystems at boot gives me decent times (10 seconds, I think it's normal, but having my Fedora on a ssd disk seems always too much). At least I don't have anymore the 60 seconds wait. Did I make things right or my workaround can lead to other problems?
AIUI, rpcbind is one of those services that's supposed to be started on demand by systemd when a request comes in on the socket. Perhaps there's some delay in that startup?
Yes, there are systemd.socket and systemd.service unit files. There was a patch back in 2012 that added socket activation support that still doesn't seem to be applied: http://lists.freedesktop.org/archives/systemd-devel/2012-February/004352.html I wonder if the systemd setup assumes it supports socket activation but rpcbind actually doesn't? But I don't understand well how this is meant to work.
That may very well be the case. I'll note that on one of my client VMs, rpcbind doesn't get started at boot time, but systemd is listening on the unix socket: systemd 1 root 41u unix 0xffff880035d64a40 0t0 21133 /var/run/rpcbind.sock type=STREAM Perhaps we should transition this to a rpcbind bug and pursue the problem there?
I wonder if there's anything else we could check quickly. Is there some systemd debugging that would show that it was trying to hand the socket off to rpcbind and failing? Maybe simplest is just to apply the old patch and see what happens.
I will build a patched version of rpcbind to try that as soon as I have enough spare time. Meanwhile I noticed that the bug (bug #747363) with the request to apply that old patch has been closed as "wontfix" for F16 EOL without any activity of the package maintainer. Should I reopen that bug and move it to Rawhide? A word from the maintainer would be appreciated.
(In reply to Mattia Verga from comment #15) > I will build a patched version of rpcbind to try that as soon as I have > enough spare time. > > Meanwhile I noticed that the bug (bug #747363) with the request to apply > that old patch has been closed as "wontfix" for F16 EOL without any activity > of the package maintainer. Should I reopen that bug and move it to Rawhide? Good grief, thanks for spotting that. So yes I think it makes sense to update that one and then maybe once you've confirmed this is the problem we could close this bug as a dup of that one.
With rpcbind rebuilt with the patch I can confirm that rpcbind.service, rpcbind.socket and rpc-statd.service are being started correctly. However there should be something other wrong, because I'm now unable to mount nfs directories: mount says that rpc-statd.service is not running, but systemctl says it is...
Thanks for your persistence! That sounds like progress, at least.... I took a quick look at nfs-utils/systemd/ to try to figure out how it ensures things get started on mount. I guess it's the "Before=remote-fs-pre.target" in nfs-client.target. Except that there's a special comment about statd: "we don't "Wants=rpc-statd.service" as "mount.nfs" will arrange to start that on demand if needed." I wonder how that works. The unit files here all refer to "rpcbind.target"--is there even such a thing? Looks to me like those should all be rpcbind.socket.
(In reply to J. Bruce Fields from comment #18) > Except that there's a special comment about statd: "we don't > "Wants=rpc-statd.service" as "mount.nfs" will arrange to start that on > demand if needed." I wonder how that works. From systemd/README: " the rpc-statd.service unit is started by /usr/sbin/start-statd which mount.nfs will run if statd is needed." I guess that's so we can skip starting statd if we're doing v4 or mounting with "nolock". I can't see any evidence that we're actually installing such a thing. Looks like we need to: something that just does "systemctl start rpc-statd.service" should be enough, I think?
(In reply to J. Bruce Fields from comment #18) > > The unit files here all refer to "rpcbind.target"--is there even such a > thing? Looks to me like those should all be rpcbind.socket. No, there's no "rpcbind.target", at least I can't find any evidence of it and I think this is the real problem. Mounting a nfs share manually or at startup make rpc-statd to start before rpcbind and this leads to the 60 seconds timeout. (see comment #9) After having tested the patched rpcbind version I came back to the official one and obviously I got the problem. I then modified rpc-statd.service by adding a "Wants=nfs-config.service rpcbind.service" and a "After=nfs-config.service rpcbind.service". This way all goes well at startup, because we are sure to start rpcbind (service and socket) BEFORE rpc-statd. I think the actual "Wants=rpcbind.target" and "After=rpcbind.target" do nothing since there's no rpcbind.target at all...
(In reply to Mattia Verga from comment #20) > No, there's no "rpcbind.target", at least I can't find any evidence of it > and I think this is the real problem. Mounting a nfs share manually or at > startup make rpc-statd to start before rpcbind and this leads to the 60 > seconds timeout. (see comment #9) > > After having tested the patched rpcbind version I came back to the official > one and obviously I got the problem. > I then modified rpc-statd.service by adding a "Wants=nfs-config.service > rpcbind.service" and a "After=nfs-config.service rpcbind.service". This way > all goes well at startup, because we are sure to start rpcbind (service and > socket) BEFORE rpc-statd. > > I think the actual "Wants=rpcbind.target" and "After=rpcbind.target" do > nothing since there's no rpcbind.target at all... OK, so just replacing "rpcbind.target" by "rpcbind.service" everywhere in nfs-utils/systemd works for you? I'm still curious now how statd is getting started despite the missing /usr/sbin/start-statd. I guess that's because you explicitly enabled statd and/or nfs-server earlier? So there may still be something to fix there. I'd also be happier if we could get this working with socket activation. I'm not sure how to debug that.
Yes, replacing rpcbind.target to rpcbind.service make it works. I did not explicitly enabled neither statd or nfs-server, simply doing a mount of a nfs share starts statd automatically, both from fstab and from command line. I don't know anything about rpcbind, but when I built the patched version I saw that both the service and the socket were started at boot. Even so statd did not find them. Maybe it's just because that patch is many years old and is no more valid?
Steved's latest version of the socket-activation patch would be worth trying: http://marc.info/?l=linux-nfs&m=141693514115346&w=2
rpcbind-0.2.2-0.0.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/rpcbind-0.2.2-0.0.fc21
rpcbind-0.2.2-0.0.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/rpcbind-0.2.2-0.0.fc20
Package rpcbind-0.2.2-0.0.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing rpcbind-0.2.2-0.0.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-15835/rpcbind-0.2.2-0.0.fc20 then log in and leave karma (feedback).
rpcbind-0.2.2-0.0.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
rpcbind-0.2.2-1.0.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/rpcbind-0.2.2-1.0.fc20