Bug 1570567

Summary:

[rpc.statd] fails to start if used to mask the service rpc-statd and failed to mount nfsv3

Product:

Red Hat Enterprise Linux 7

Reporter:

Yongcheng Yang <yoyang>

Component:

nfs-utils

Assignee:

Steve Dickson <steved>

Status:

CLOSED WONTFIX

QA Contact:

Yongcheng Yang <yoyang>

Severity:

low

Docs Contact:

Priority:

low

Version:

7.5

CC:

xzhou

Target Milestone:

Keywords:

Reproducer

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-02-15 07:38:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
reproducer1 start/stop the rpcbind/rpcbind.socket	none
reproducer2 only stop rpc-statd	none

Description Yongcheng Yang 2018-04-23 09:09:35 UTC

Created attachment 1425594 [details]
reproducer1 start/stop the rpcbind/rpcbind.socket

Description of problem:
During tests of the NFSv4-only server configure procedure (Bug 1387694), just found that rpc-statd.service sometimes gets failed when trying start. The key to reproduce this scenario is just trying to mount NFS version 3 (or v2) after rpc-statd is masked, certainly it gets failed as expected. But after recovering it, always fail to start the service rpc-statd then.

I just simplify the reproducer as following.


Version-Release number of selected component (if applicable):
# latest rhel7 nfs-utils
nfs-utils-1.3.0-0.54.el7
# upstream also has the same issue
nfs-utils-2.2.1-4.rc2.fc27


How reproducible:
100% easy


Steps to Reproduce:
1. systemctl mask --now rpc-statd
2. mount localhost:/tmp /mnt -o vers=3 <<< should fail
3. systemctl unmask rpc-statd.service
4. systemctl start rpc-statd           <<< get failed


The failure workaround:
a. If we stop/start the rpcbind/rpcbind.socket at the same time, we can work around it by just starting the rpc-statd one more time.
b. If removing other operations (rpcbind/rpcbind.socket), looks like a daemon "rpc.statd --no-notify" starts immediately when unmasking the service. We can kill that daemon and try to start it then.


Actual results:
[root@hp-dl360g9-01 ~]# rpm -q nfs-utils
nfs-utils-1.3.0-0.54.el7.x86_64
[root@hp-dl360g9-01 ~]# ./repro1.sh 
# systemctl stop rpc-statd
# systemctl stop rpcbind
Warning: Stopping rpcbind.service, but it can still be activated by:
  rpcbind.socket
# systemctl stop rpcbind.socket
# systemctl mask --now rpc-statd
Created symlink from /etc/systemd/system/rpc-statd.service to /dev/null.
# mount localhost:/tmp /mnt -o vers=3  # should fail
Failed to start rpc-statd.service: Unit is masked.
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# systemctl start rpcbind
# systemctl unmask rpc-statd.service
Removed symlink /etc/systemd/system/rpc-statd.service.
# systemctl start rpc-statd  # should not fail
Job for rpc-statd.service failed because the control process exited with error code. See "systemctl status rpc-statd.service" and "journalctl -xe" for details.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *reproduced*
# systemctl start rpc-statd  # workaround
rpcuser   2885  0.0  0.0  42420  1740 ?        Ss   04:47   0:00 /usr/sbin/rpc.statd
[root@hp-dl360g9-01 ~]

[root@hp-dl360g9-01 ~]# ./repro2.sh 
# systemctl stop rpc-statd
# systemctl mask --now rpc-statd
Created symlink from /etc/systemd/system/rpc-statd.service to /dev/null.
# mount localhost:/tmp /mnt -o vers=3  # should fail
Failed to start rpc-statd.service: Unit is masked.
mount.nfs: access denied by server while mounting localhost:/tmp
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# systemctl unmask rpc-statd.service
Removed symlink /etc/systemd/system/rpc-statd.service.
rpcuser   2933  0.0  0.0  42420  1740 ?        Ss   04:49   0:00 rpc.statd --no-notify
# systemctl start rpc-statd  # should not fail
Job for rpc-statd.service failed because the control process exited with error code. See "systemctl status rpc-statd.service" and "journalctl -xe" for details.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *reproduced*
rpcuser   2933  0.0  0.0  42420  1740 ?        Ss   04:49   0:00 rpc.statd --no-notify
# pkill rpc.statd  # workaround
# systemctl start rpc-statd  # workaround
rpcuser   2982  0.0  0.0  42420  1744 ?        Ss   04:49   0:00 /usr/sbin/rpc.statd
[root@hp-dl360g9-01 ~]# 


Expected results:
Service rpc.statd can be started successfully after unmasking it.


Additional info:
Just setting priority as "low" for now as there is workaround for this problem.
However, if the NFSv4-only server configuration is getting more and more popular, customer may sometimes encounter this problem too.

Comment 1 Yongcheng Yang 2018-04-23 09:10:48 UTC

Created attachment 1425595 [details]
reproducer2 only stop rpc-statd

Comment 4 RHEL Program Management 2021-02-15 07:38:34 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.