Description of problem: systemctl start nfs hang up long time and fail. by debug, it always hang up on "systemd-tty-ask-password-agent --watch" 12484 ? S 0:00 | \_ sshd: root@pts/0 12493 pts/0 Ss 0:00 | \_ -bash 13709 pts/0 S+ 0:00 | \_ systemctl start nfs 13710 pts/0 S+ 0:00 | \_ /usr/bin/systemd-tty-ask-password-agent --watch # systemctl status nfs ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2017-02-20 01:23:36 EST; 1min 58s ago Process: 13657 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE) Process: 13656 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 13657 (code=exited, status=1/FAILURE) Feb 20 01:14:50 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services... Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out) Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: unable to set any sockets for nfsd Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services. Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state. Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'. Version-Release number of selected component (if applicable): [root@bkr-hv03-guest40 ~]# lsb_release -sir Fedora 25 [root@bkr-hv03-guest40 ~]# rpm -q nfs-utils nfs-utils-2.1.1-1.fc25.x86_64 How reproducible: always Steps to Reproduce: 1. mkdir /home/test 2. echo "/home/test *(rw,no_root_squash)" >/etc/exports 3. systemctl stop nfs && systemctl start nfs Actual results: systemctl start nfs hang up long time and fail. [root@bkr-hv03-guest40 nfstest]# systemctl status nfs ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2017-02-20 01:23:36 EST; 1min 58s ago Process: 13657 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE) Process: 13656 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 13657 (code=exited, status=1/FAILURE) Feb 20 01:14:50 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services... Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out) Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: unable to set any sockets for nfsd Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services. Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state. Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'. Expected results: works fine Additional info:
How reproducible: always ''' [root@bkr-hv03-guest40 ~]# echo "/usr/share *(ro)" >/etc/exports [root@bkr-hv03-guest40 ~]# systemctl stop nfs [root@bkr-hv03-guest40 ~]# time systemctl start nfs Job for nfs-server.service failed because the control process exited with error code. See "systemctl status nfs-server.service" and "journalctl -xe" for details. real 8m47.408s user 0m0.003s sys 0m0.005s [root@bkr-hv03-guest40 ~]# systemctl status nfs ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2017-02-20 01:46:04 EST; 2min 23s ago Process: 13858 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE) Process: 13857 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 13858 (code=exited, status=1/FAILURE) Feb 20 01:37:17 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services... Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13858]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out) Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13858]: rpc.nfsd: unable to set any sockets for nfsd Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services. Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state. Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'. '''
I think the problem may be with the latest rpcbind update. What happens if you run this before step 3 in your reproducer?: systemd-tmpfiles --create
Or better yet, try this: systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf I think that probably needs to be added to the postinstall scriptlet of rpcbind.
(In reply to Scott Mayhew from comment #4) > Or better yet, try this: > > systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf > > I think that probably needs to be added to the postinstall scriptlet of > rpcbind. seems systemd-tmpfiles can not solve the problem: ''' [root@bkr-hv03-guest40 ~]# cat /etc/exports /usr/share *(ro) [root@bkr-hv03-guest40 ~]# systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf [root@bkr-hv03-guest40 ~]# time systemctl start nfs Job for nfs-server.service failed because the control process exited with error code. See "systemctl status nfs-server.service" and "journalctl -xe" for details. real 8m47.313s user 0m0.004s sys 0m0.005s [root@bkr-hv03-guest40 ~]# time systemctl status nfs ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2017-02-20 21:48:47 EST; 1min 7s ago Process: 15662 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE) Process: 15661 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 15662 (code=exited, status=1/FAILURE) Feb 20 21:40:00 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services... Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[15662]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out) Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[15662]: rpc.nfsd: unable to set any sockets for nfsd Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services. Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state. Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'. real 0m0.009s user 0m0.005s sys 0m0.002s ''' ''' 15564 pts/1 Ss 0:00 | \_ -bash 15651 pts/1 S+ 0:00 | \_ systemctl start nfs 15652 pts/1 S+ 0:00 | \_ /usr/bin/systemd-tty-ask-password '''
(In reply to Scott Mayhew from comment #4) > Or better yet, try this: > > systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf > > I think that probably needs to be added to the postinstall scriptlet of > rpcbind. I think this commit took care of the problem you are talking about: commit 0df1187eb1cde0df1a2cd1ffb09ab7782a9b5471 Author: Steve Dickson <steved> Date: Sat Jan 28 15:46:33 2017 -0500 Corrected boot dependency in systemd files (bz 1401561) Signed-off-by: Steve Dickson <steved>
(In reply to Yin.JianHong from comment #5) > ''' > 15564 pts/1 Ss 0:00 | \_ -bash > 15651 pts/1 S+ 0:00 | \_ systemctl start nfs > 15652 pts/1 S+ 0:00 | \_ > /usr/bin/systemd-tty-ask-password > ''' The question I have is why is systemd-tty-ask-password being called?? I'm thinking that's where the time is being spent...
(In reply to Steve Dickson from comment #6) > (In reply to Scott Mayhew from comment #4) > > Or better yet, try this: > > > > systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf > > > > I think that probably needs to be added to the postinstall scriptlet of > > rpcbind. > > I think this commit took care of the problem you are talking about: > > commit 0df1187eb1cde0df1a2cd1ffb09ab7782a9b5471 > Author: Steve Dickson <steved> > Date: Sat Jan 28 15:46:33 2017 -0500 > > Corrected boot dependency in systemd files (bz 1401561) > > Signed-off-by: Steve Dickson <steved> I see... I can reproduce the problem easily though, and in my case the problem is that nfsd can't register with rpcbind because rpcbind's not running, and rpcbind's not running because /run/rpcbind doesn't exist and it can't create the lock file. [root@localhost ~]# systemctl status nfs ● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled) Drop-In: /run/systemd/generator/nfs-server.service.d └─order-with-mounts.conf Active: failed (Result: exit-code) since Tue 2017-02-21 08:38:49 EST; 15s ago Process: 28004 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS) Process: 28002 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS) Process: 27998 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS) Process: 28035 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE) Process: 28034 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 28035 (code=exited, status=1/FAILURE) Feb 20 11:36:38 localhost.localdomain systemd[1]: Starting NFS server and services... Feb 21 08:38:49 localhost.localdomain rpc.nfsd[28035]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out) Feb 21 08:38:49 localhost.localdomain rpc.nfsd[28035]: rpc.nfsd: unable to set any sockets for nfsd Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE Feb 21 08:38:49 localhost.localdomain systemd[1]: Failed to start NFS server and services. Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Unit entered failed state. Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Failed with result 'exit-code'. [root@localhost ~]# systemctl status rpcbind ● rpcbind.service - RPC Bind Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2017-02-20 11:24:50 EST; 21h ago Docs: man:rpcbind(8) Main PID: 2879 (code=exited, status=1/FAILURE) Feb 20 11:24:50 localhost.localdomain systemd[1]: Starting RPC Bind... Feb 20 11:24:50 localhost.localdomain rpcbind[2879]: rpcbind: /run/rpcbind/rpcbind.lock: No such file or directory Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Main process exited, code=exited, status=1/FAILURE Feb 20 11:24:50 localhost.localdomain systemd[1]: Failed to start RPC Bind. Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Unit entered failed state. Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Failed with result 'exit-code'.
(In reply to Steve Dickson from comment #7) > (In reply to Yin.JianHong from comment #5) > > ''' > > 15564 pts/1 Ss 0:00 | \_ -bash > > 15651 pts/1 S+ 0:00 | \_ systemctl start nfs > > 15652 pts/1 S+ 0:00 | \_ > > /usr/bin/systemd-tty-ask-password > > ''' > The question I have is why is systemd-tty-ask-password being called?? > I'm thinking that's where the time is being spent... Just a thought... since rpcbind runs as the rpc user instead of root, maybe that's preventing it from using systemd-tmpfile-setup?
As a side note... Maybe we should consider having rpc.nfsd print a more helpful error message when it sees -ETIMEDOUT here? That's almost always occurring because it can't talk to rpcbind. It might also not hurt to consider allowing knfsd to start up when rpcbind registration fails, but throw a printk to warn about missing registration.
I think I finally fix this problem.... Please see https://bugzilla.redhat.com/show_bug.cgi?id=1401561
*** This bug has been marked as a duplicate of bug 1401561 ***