Bug 772090

Summary: NFS mount fails.
Product: [Fedora] Fedora Reporter: David Woodhouse <dwmw2>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: bfields, jlayton, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-23 14:43:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Woodhouse 2012-01-05 22:33:27 UTC
On a client machine, 'mount.nfs' silently hangs:

[root@shinybook dwmw2]# mount twosheds:/twosheds /twosheds 

strace(1) shows that it is repeatedly trying the mount(8) system call and getting ECONNREFUSED, sleeping for an exponentially increasing amount of time, and trying again:

[pid  3558] socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = 3
[pid  3558] bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid  3558] connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:8b0:10b:1:21d:7dff:fe04:dbe2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid  3558] getsockname(3, {sa_family=AF_INET6, sin6_port=htons(54344), inet_pton(AF_INET6, "2001:8b0:10b:1:e6ce:8fff:fe1f:f2c0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid  3558] close(3)                    = 0
[pid  3558] mount("twosheds:/twosheds", "/twosheds", "nfs", 0, "vers=4,addr=2001:8b0:10b:1:21d:7"...) = -1 ECONNREFUSED (Connection refused)
[pid  3558] socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
[pid  3558] bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid  3558] connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("90.155.92.209")}, 16) = 0
[pid  3558] getsockname(3, {sa_family=AF_INET, sin_port=htons(45435), sin_addr=inet_addr("90.155.92.247")}, [16]) = 0
[pid  3558] close(3)                    = 0
[pid  3558] mount("twosheds:/twosheds", "/twosheds", "nfs", 0, "vers=4,addr=90.155.92.209,client"...) = -1 ECONNREFUSED (Connection refused)
[pid  3558] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid  3558] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid  3558] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  3558] nanosleep({4, 0}, 


0x7ffff3f57d00) = 0
[pid  3558] socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = 3
[pid  3558] bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid  3558] connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2001:8b0:10b:1:21d:7dff:fe04:dbe2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
[pid  3558] getsockname(3, {sa_family=AF_INET6, sin6_port=htons(60008), inet_pton(AF_INET6, "2001:8b0:10b:1:e6ce:8fff:fe1f:f2c0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
[pid  3558] close(3)                    = 0
[pid  3558] mount("twosheds:/twosheds", "/twosheds", "nfs", 0, "vers=4,addr=2001:8b0:10b:1:21d:7"...) = -1 ECONNREFUSED (Connection refused)
[pid  3558] socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
[pid  3558] bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid  3558] connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("90.155.92.209")}, 16) = 0
[pid  3558] getsockname(3, {sa_family=AF_INET, sin_port=htons(41220), sin_addr=inet_addr("90.155.92.247")}, [16]) = 0
[pid  3558] close(3)                    = 0
[pid  3558] mount("twosheds:/twosheds", "/twosheds", "nfs", 0, "vers=4,addr=90.155.92.209,client"...) = -1 ECONNREFUSED (Connection refused)
[pid  3558] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid  3558] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid  3558] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  3558] nanosleep({8, 0}, ^C <unfinished ...>


The underlying problem appears to be that rpc.nfsd isn't actually running on the server, although silently hanging doesn't seem to be the appropriate response for the client.

[root@shinybook dwmw2]# rpcinfo -p twosheds
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  52362  status
    100024    1   tcp  49044  status
    100011    1   udp    615  rquotad
    100011    2   udp    615  rquotad
    100011    1   tcp    618  rquotad
    100011    2   tcp    618  rquotad
    100005    1   udp  52313  mountd
    100005    1   tcp  60619  mountd
    100005    2   udp  52637  mountd
    100005    2   tcp  53702  mountd
    100005    3   udp  55119  mountd
    100005    3   tcp  55909  mountd




What is supposed to start the rpc.nfsd process on the server? On the *server* now:

[root@twosheds ~]# systemctl status nfs-server.service
nfs-server.service - NFS Server
	  Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled)
	  Active: active (running) since Thu, 05 Jan 2012 01:22:44 +0000; 21h ago
	Main PID: 16550 (rpc.rquotad)
	  CGroup: name=systemd:/system/nfs-server.service
		  ├ 16550 /usr/sbin/rpc.rquotad
		  └ 16554 /usr/sbin/rpc.mountd

Restarting nfs-server.service with 'systemctl restart nfs-server.service' doesn't seem to make any difference. If I manually run 'rpc.nfsd' that works, but I seem to need to do this manually every time the server reboots. Not optimal.

Both machines are F16 x86_64 with nfs-utils-1.2.5-3.fc16.

Comment 1 David Woodhouse 2012-01-05 22:39:46 UTC
[root@twosheds ~]# systemctl show nfs-server.service | grep rpc.nfsd
ExecStart={ path=/usr/sbin/rpc.nfsd ; argv[]=/usr/sbin/rpc.nfsd $RPCNFSDARGS ${RPCNFSDCOUNT} ; ignore_errors=no ; start_time=[n/a] ; stop_time=[Thu, 05 Jan 2012 22:31:40 +0000] ; pid=11132 ; code=exited ; status=0 }
ExecStop={ path=/usr/sbin/rpc.nfsd ; argv[]=/usr/sbin/rpc.nfsd 0 ; ignore_errors=no ; start_time=[n/a] ; stop_time=[Thu, 05 Jan 2012 22:31:40 +0000] ; pid=11122 ; code=exited ; status=0 }

[root@twosheds ~]# grep RPCNFSD /etc/sysconfig/nfs
#RPCNFSDARGS="-N 2 -N 3"
RPCNFSDARGS="-N 4"
#RPCNFSDCOUNT=8

I tried editing that file so that RPCNFSDCOUNT is set, and then restarting nfs-server.service. But still it appears the same.

Comment 2 David Woodhouse 2012-01-05 22:56:11 UTC
I lie; setting RPCNFSDCOUNT explicitly does fix it. Do we need to make the unit file cope? Something like ${RPCNFSDCOUNT:-8} would suffice if it were being interpreted by bash, but I don't think it is. Perhaps we should just set RPCNFSDCOUNT to 8 before invoking /etc/sysconfig/nfs? 

We also need to fix the client behaviour.

Comment 3 Steve Dickson 2012-01-06 13:51:07 UTC
(In reply to comment #2)
> I lie; setting RPCNFSDCOUNT explicitly does fix it. Do we need to make the unit
> file cope? Something like ${RPCNFSDCOUNT:-8} would suffice if it were being
> interpreted by bash, but I don't think it is. Perhaps we should just set
> RPCNFSDCOUNT to 8 before invoking /etc/sysconfig/nfs?
The problem stems for when the systemd enabled nfs-utils is installed the
original /etc/sysconfig/nfs is not overwritten. Instead a 
/etc/sysconfig/nfs.rpmnew is create where the RPCNFSDCOUNT variable
is probably not set (as it is in the .rpmnew version). Now with the
systemd world order the RPCNFSDCOUNT has to be set since there is 
no bash like thing to make sure its always set. 

Long term the answer is probably to have rpc.* daemons all read from
one configuration file when there are no command line arguments, similar
to what mount.nfs does.
 
> 
> We also need to fix the client behaviour.
Hanging on ECONNREFUSED is by design. The assumption is the server
is on its way up...

Comment 4 Steve Dickson 2012-03-23 14:43:34 UTC

*** This bug has been marked as a duplicate of bug 757452 ***