Bug 1424940 - [systemd] rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Summary: [systemd] rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed ...
Keywords:
Status: CLOSED DUPLICATE of bug 1401561
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 25
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: JianHong Yin
URL:
Whiteboard:
Depends On:
Blocks: 1422947
TreeView+ depends on / blocked
 
Reported: 2017-02-20 06:33 UTC by JianHong Yin
Modified: 2017-04-07 15:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-07 15:26:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description JianHong Yin 2017-02-20 06:33:12 UTC
Description of problem:

systemctl start nfs hang up long time and fail.

by debug, it always hang up on "systemd-tty-ask-password-agent --watch"
12484 ?        S      0:00  |   \_ sshd: root@pts/0
12493 pts/0    Ss     0:00  |       \_ -bash
13709 pts/0    S+     0:00  |           \_ systemctl start nfs
13710 pts/0    S+     0:00  |               \_ /usr/bin/systemd-tty-ask-password-agent --watch

# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 01:23:36 EST; 1min 58s ago
  Process: 13657 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 13656 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 13657 (code=exited, status=1/FAILURE)

Feb 20 01:14:50 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.


Version-Release number of selected component (if applicable):
[root@bkr-hv03-guest40 ~]# lsb_release -sir
Fedora 25
[root@bkr-hv03-guest40 ~]# rpm -q nfs-utils
nfs-utils-2.1.1-1.fc25.x86_64

How reproducible:
always

Steps to Reproduce:
1. mkdir /home/test
2. echo "/home/test *(rw,no_root_squash)" >/etc/exports
3. systemctl stop nfs && systemctl start nfs

Actual results:

systemctl start nfs hang up long time and fail.

[root@bkr-hv03-guest40 nfstest]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 01:23:36 EST; 1min 58s ago
  Process: 13657 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 13656 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 13657 (code=exited, status=1/FAILURE)

Feb 20 01:14:50 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.


Expected results:
works fine

Additional info:

Comment 1 JianHong Yin 2017-02-20 06:50:46 UTC
How reproducible: always

'''
[root@bkr-hv03-guest40 ~]# echo "/usr/share *(ro)" >/etc/exports
[root@bkr-hv03-guest40 ~]# systemctl stop nfs
[root@bkr-hv03-guest40 ~]# time systemctl start nfs
Job for nfs-server.service failed because the control process exited with error code.
See "systemctl status nfs-server.service" and "journalctl -xe" for details.

real	8m47.408s
user	0m0.003s
sys	0m0.005s
[root@bkr-hv03-guest40 ~]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 01:46:04 EST; 2min 23s ago
  Process: 13858 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 13857 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 13858 (code=exited, status=1/FAILURE)

Feb 20 01:37:17 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13858]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13858]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.
'''

Comment 3 Scott Mayhew 2017-02-20 17:07:30 UTC
I think the problem may be with the latest rpcbind update.

What happens if you run this before step 3 in your reproducer?:

systemd-tmpfiles --create

Comment 4 Scott Mayhew 2017-02-20 17:28:51 UTC
Or better yet, try this:

systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf

I think that probably needs to be added to the postinstall scriptlet of rpcbind.

Comment 5 JianHong Yin 2017-02-21 02:51:57 UTC
(In reply to Scott Mayhew from comment #4)
> Or better yet, try this:
> 
> systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
> 
> I think that probably needs to be added to the postinstall scriptlet of
> rpcbind.

seems systemd-tmpfiles can not solve the problem:

'''
[root@bkr-hv03-guest40 ~]# cat /etc/exports
/usr/share *(ro)
[root@bkr-hv03-guest40 ~]# systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
[root@bkr-hv03-guest40 ~]# time systemctl start  nfs
Job for nfs-server.service failed because the control process exited with error code.
See "systemctl status nfs-server.service" and "journalctl -xe" for details.

real	8m47.313s
user	0m0.004s
sys	0m0.005s
[root@bkr-hv03-guest40 ~]# time systemctl status  nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 21:48:47 EST; 1min 7s ago
  Process: 15662 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 15661 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 15662 (code=exited, status=1/FAILURE)

Feb 20 21:40:00 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[15662]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[15662]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.

real	0m0.009s
user	0m0.005s
sys	0m0.002s
'''

'''
15564 pts/1    Ss     0:00  |       \_ -bash
15651 pts/1    S+     0:00  |           \_ systemctl start nfs
15652 pts/1    S+     0:00  |               \_ /usr/bin/systemd-tty-ask-password
'''

Comment 6 Steve Dickson 2017-02-21 13:46:43 UTC
(In reply to Scott Mayhew from comment #4)
> Or better yet, try this:
> 
> systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
> 
> I think that probably needs to be added to the postinstall scriptlet of
> rpcbind.

I think this commit took care of the problem you are talking about:

commit 0df1187eb1cde0df1a2cd1ffb09ab7782a9b5471
Author: Steve Dickson <steved>
Date:   Sat Jan 28 15:46:33 2017 -0500

    Corrected boot dependency in systemd files (bz 1401561)
    
    Signed-off-by: Steve Dickson <steved>

Comment 7 Steve Dickson 2017-02-21 13:48:42 UTC
(In reply to Yin.JianHong from comment #5)
> '''
> 15564 pts/1    Ss     0:00  |       \_ -bash
> 15651 pts/1    S+     0:00  |           \_ systemctl start nfs
> 15652 pts/1    S+     0:00  |               \_
> /usr/bin/systemd-tty-ask-password
> '''
The question I have is why is systemd-tty-ask-password being called?? 
I'm thinking that's where the time is being spent...

Comment 8 Scott Mayhew 2017-02-21 13:56:14 UTC
(In reply to Steve Dickson from comment #6)
> (In reply to Scott Mayhew from comment #4)
> > Or better yet, try this:
> > 
> > systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
> > 
> > I think that probably needs to be added to the postinstall scriptlet of
> > rpcbind.
> 
> I think this commit took care of the problem you are talking about:
> 
> commit 0df1187eb1cde0df1a2cd1ffb09ab7782a9b5471
> Author: Steve Dickson <steved>
> Date:   Sat Jan 28 15:46:33 2017 -0500
> 
>     Corrected boot dependency in systemd files (bz 1401561)
>     
>     Signed-off-by: Steve Dickson <steved>

I see... I can reproduce the problem easily though, and in my case the problem is that nfsd can't register with rpcbind because rpcbind's not running, and rpcbind's not running because /run/rpcbind doesn't exist and it can't create the lock file.

[root@localhost ~]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/nfs-server.service.d
           └─order-with-mounts.conf
   Active: failed (Result: exit-code) since Tue 2017-02-21 08:38:49 EST; 15s ago
  Process: 28004 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
  Process: 28002 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
  Process: 27998 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS)
  Process: 28035 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 28034 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 28035 (code=exited, status=1/FAILURE)

Feb 20 11:36:38 localhost.localdomain systemd[1]: Starting NFS server and services...
Feb 21 08:38:49 localhost.localdomain rpc.nfsd[28035]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 21 08:38:49 localhost.localdomain rpc.nfsd[28035]: rpc.nfsd: unable to set any sockets for nfsd
Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 21 08:38:49 localhost.localdomain systemd[1]: Failed to start NFS server and services.
Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Unit entered failed state.
Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Failed with result 'exit-code'.

[root@localhost ~]# systemctl status rpcbind
● rpcbind.service - RPC Bind
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 11:24:50 EST; 21h ago
     Docs: man:rpcbind(8)
 Main PID: 2879 (code=exited, status=1/FAILURE)

Feb 20 11:24:50 localhost.localdomain systemd[1]: Starting RPC Bind...
Feb 20 11:24:50 localhost.localdomain rpcbind[2879]: rpcbind: /run/rpcbind/rpcbind.lock: No such file or directory
Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 11:24:50 localhost.localdomain systemd[1]: Failed to start RPC Bind.
Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Unit entered failed state.
Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Failed with result 'exit-code'.

Comment 9 Scott Mayhew 2017-02-21 13:59:00 UTC
(In reply to Steve Dickson from comment #7)
> (In reply to Yin.JianHong from comment #5)
> > '''
> > 15564 pts/1    Ss     0:00  |       \_ -bash
> > 15651 pts/1    S+     0:00  |           \_ systemctl start nfs
> > 15652 pts/1    S+     0:00  |               \_
> > /usr/bin/systemd-tty-ask-password
> > '''
> The question I have is why is systemd-tty-ask-password being called?? 
> I'm thinking that's where the time is being spent...

Just a thought... since rpcbind runs as the rpc user instead of root, maybe that's preventing it from using systemd-tmpfile-setup?

Comment 10 Jeff Layton 2017-02-21 14:03:15 UTC
As a side note...

Maybe we should consider having rpc.nfsd print a more helpful error message when it sees -ETIMEDOUT here? That's almost always occurring because it can't talk to rpcbind.

It might also not hurt to consider allowing knfsd to start up when rpcbind registration fails, but throw a printk to warn about missing registration.

Comment 11 Steve Dickson 2017-03-24 18:58:36 UTC
I think I finally fix this problem.... Please see
   https://bugzilla.redhat.com/show_bug.cgi?id=1401561

Comment 12 Steve Dickson 2017-04-07 15:26:19 UTC

*** This bug has been marked as a duplicate of bug 1401561 ***


Note You need to log in before you can comment on or make changes to this bug.