Bug 1570567

Summary: [rpc.statd] fails to start if used to mask the service rpc-statd and failed to mount nfsv3
Product: Red Hat Enterprise Linux 7 Reporter: Yongcheng Yang <yoyang>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Yongcheng Yang <yoyang>
Severity: low Docs Contact:
Priority: low    
Version: 7.5CC: xzhou
Target Milestone: rcKeywords: Reproducer
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-15 07:38:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer1 start/stop the rpcbind/rpcbind.socket
none
reproducer2 only stop rpc-statd none

Description Yongcheng Yang 2018-04-23 09:09:35 UTC
Created attachment 1425594 [details]
reproducer1 start/stop the rpcbind/rpcbind.socket

Description of problem:
During tests of the NFSv4-only server configure procedure (Bug 1387694), just found that rpc-statd.service sometimes gets failed when trying start. The key to reproduce this scenario is just trying to mount NFS version 3 (or v2) after rpc-statd is masked, certainly it gets failed as expected. But after recovering it, always fail to start the service rpc-statd then.

I just simplify the reproducer as following.


Version-Release number of selected component (if applicable):
# latest rhel7 nfs-utils
nfs-utils-1.3.0-0.54.el7
# upstream also has the same issue
nfs-utils-2.2.1-4.rc2.fc27


How reproducible:
100% easy


Steps to Reproduce:
1. systemctl mask --now rpc-statd
2. mount localhost:/tmp /mnt -o vers=3 <<< should fail
3. systemctl unmask rpc-statd.service
4. systemctl start rpc-statd           <<< get failed


The failure workaround:
a. If we stop/start the rpcbind/rpcbind.socket at the same time, we can work around it by just starting the rpc-statd one more time.
b. If removing other operations (rpcbind/rpcbind.socket), looks like a daemon "rpc.statd --no-notify" starts immediately when unmasking the service. We can kill that daemon and try to start it then.


Actual results:
[root@hp-dl360g9-01 ~]# rpm -q nfs-utils
nfs-utils-1.3.0-0.54.el7.x86_64
[root@hp-dl360g9-01 ~]# ./repro1.sh 
# systemctl stop rpc-statd
# systemctl stop rpcbind
Warning: Stopping rpcbind.service, but it can still be activated by:
  rpcbind.socket
# systemctl stop rpcbind.socket
# systemctl mask --now rpc-statd
Created symlink from /etc/systemd/system/rpc-statd.service to /dev/null.
# mount localhost:/tmp /mnt -o vers=3  # should fail
Failed to start rpc-statd.service: Unit is masked.
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# systemctl start rpcbind
# systemctl unmask rpc-statd.service
Removed symlink /etc/systemd/system/rpc-statd.service.
# systemctl start rpc-statd  # should not fail
Job for rpc-statd.service failed because the control process exited with error code. See "systemctl status rpc-statd.service" and "journalctl -xe" for details.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *reproduced*
# systemctl start rpc-statd  # workaround
rpcuser   2885  0.0  0.0  42420  1740 ?        Ss   04:47   0:00 /usr/sbin/rpc.statd
[root@hp-dl360g9-01 ~]

[root@hp-dl360g9-01 ~]# ./repro2.sh 
# systemctl stop rpc-statd
# systemctl mask --now rpc-statd
Created symlink from /etc/systemd/system/rpc-statd.service to /dev/null.
# mount localhost:/tmp /mnt -o vers=3  # should fail
Failed to start rpc-statd.service: Unit is masked.
mount.nfs: access denied by server while mounting localhost:/tmp
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# systemctl unmask rpc-statd.service
Removed symlink /etc/systemd/system/rpc-statd.service.
rpcuser   2933  0.0  0.0  42420  1740 ?        Ss   04:49   0:00 rpc.statd --no-notify
# systemctl start rpc-statd  # should not fail
Job for rpc-statd.service failed because the control process exited with error code. See "systemctl status rpc-statd.service" and "journalctl -xe" for details.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *reproduced*
rpcuser   2933  0.0  0.0  42420  1740 ?        Ss   04:49   0:00 rpc.statd --no-notify
# pkill rpc.statd  # workaround
# systemctl start rpc-statd  # workaround
rpcuser   2982  0.0  0.0  42420  1744 ?        Ss   04:49   0:00 /usr/sbin/rpc.statd
[root@hp-dl360g9-01 ~]# 


Expected results:
Service rpc.statd can be started successfully after unmasking it.


Additional info:
Just setting priority as "low" for now as there is workaround for this problem.
However, if the NFSv4-only server configuration is getting more and more popular, customer may sometimes encounter this problem too.

Comment 1 Yongcheng Yang 2018-04-23 09:10:48 UTC
Created attachment 1425595 [details]
reproducer2 only stop rpc-statd

Comment 4 RHEL Program Management 2021-02-15 07:38:34 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.