Bug 1275082 - [vdsm] NFS mount fails sometimes with "rpc.statd is not running but is required for remote locking"
Summary: [vdsm] NFS mount fails sometimes with "rpc.statd is not running but is requir...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils
Version: 7.2
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Steve Dickson
QA Contact: Yongcheng Yang
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-25 16:36 UTC by Elad
Modified: 2019-12-16 05:02 UTC (History)
14 users (show)

Fixed In Version: nfs-utils-1.3.0-0.26.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 05:00:54 UTC
Target Upstream Version:


Attachments (Terms of Use)
/var/log/ from host and engine.log (10.71 MB, application/x-gzip)
2015-10-25 16:36 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2383 0 normal SHIPPED_LIVE nfs-utils bug fix and enhancement update 2016-11-03 13:53:02 UTC

Description Elad 2015-10-25 16:36:51 UTC
Created attachment 1086258 [details]
/var/log/ from host and engine.log

Description of problem:
NFS mount fails with "rpc.statd is not running" error.

Reproduced while using nfs-utils-1.3.0-0.21.el7.x86_64.
On older nfs-utils versions (1.3.0-0.17 for example) the issue does not occur.


Version-Release number of selected component (if applicable):
vdsm-4.17.10-5.el7ev.noarch
nfs-utils-1.3.0-0.21.el7.x86_64

How reproducible:
Always


Steps to Reproduce:
occurred randomly on my hosted-engine setup, but I guess it could be reproduced with the following:
1. Host connected to storage pool with NFS domain.
2. Block connectivity to the NFS domain from the host using iptables.

Actual results:
Host cannot resume its connectivity to the storage server:

Thread-774::ERROR::2015-10-25 18:20:11,568::hsm::2465::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 419, in connect
    return self._mountCon.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 232, in connect
    six.reraise(t, v, tb)
  File "/usr/share/vdsm/storage/storageServer.py", line 224, in connect
    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/share/vdsm/storage/mount.py", line 225, in mount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 241, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (32, ";mount.nfs: rpc.statd is not running but is required for remote locking.\nmount.nfs: Either use '-o nolock' to keep locks local, or start statd.\nmount.nfs: an incorrect mount option was specified\n")

Host moves to 'Non operational'


Expected results:
NFS mount should succeed.

Additional info: 

[root@green-vdsc /]# systemctl -a |grep nfs
  proc-fs-nfsd.mount                                                                                                                           loaded    inactive dead      NFSD configuration filesystem
  var-lib-nfs-rpc_pipefs.mount                                                                                                                 loaded    inactive dead      RPC Pipe File System
  nfs-config.service                                                                                                                           loaded    active   exited    Preprocess NFS configuration
  nfs-idmapd.service                                                                                                                           loaded    inactive dead      NFSv4 ID-name mapping service
  nfs-mountd.service                                                                                                                           loaded    inactive dead      NFS Mount Daemon
  nfs-server.service                                                                                                                           loaded    inactive dead      NFS server and services
  nfs-utils.service                                                                                                                            loaded    inactive dead      NFS server and client services
  nfs-client.target                                                                                                                            loaded    inactive dead      NFS client services

[root@green-vdsc /]# systemctl -a |grep rpc
  var-lib-nfs-rpc_pipefs.mount                                                                                                                 loaded    inactive dead      RPC Pipe File System
  auth-rpcgss-module.service                                                                                                                   loaded    inactive dead      Kernel Module supporting RPCSEC_GSS
  rpc-gssd.service                                                                                                                             loaded    inactive dead      RPC security service for NFS client and server
  rpc-statd-notify.service                                                                                                                     loaded    inactive dead      Notify NFS peers of a restart
  rpc-statd.service                                                                                                                            loaded    active   running   NFS status monitor for NFSv2/3 locking.
  rpc-svcgssd.service                                                                                                                          loaded    inactive dead      RPC security service for NFS server
  rpcbind.service                                                                                                                              loaded    active   running   RPC bind service
  rpcbind.socket                                                                                                                               loaded    active   running   RPCbind Server Activation Socket
  rpcbind.target                                                                                                                               loaded    active   active    RPC Port Mapper

Comment 1 Meni Yakove 2015-10-25 16:44:30 UTC
Elad, I think that the rpc-statd is running when the mount fail and only after restart rpc-statd service vdsm can mount the NFS share.

Comment 2 Yaniv Kaul 2015-10-26 07:04:27 UTC
Do we suspect a vdsm issue or a platform issue with NFS utils pacakge?  If the latter,  please move the bug to platform.

Comment 3 Dan Kenigsberg 2015-10-26 07:29:42 UTC
and in particular - hasn't it worked OK on a former (el7.1) platform?

Comment 4 Elad 2015-10-26 12:55:16 UTC
(In reply to Dan Kenigsberg from comment #3)
> and in particular - hasn't it worked OK on a former (el7.1) platform?

Indeed, using nfs-utils-1.3.0-0.21.el7.x86_64 over RHEL7.1 we see the issue reproduced.

Comment 6 Steve Dickson 2015-10-26 16:11:41 UTC
Make sure you are using the latest version of rpcbind. That version
fixed a similar start up issue.

Comment 7 Fabian Deutsch 2016-02-09 07:35:29 UTC
nfs-utils-1.3.0-0.21.el7_2 is the latest one (actualy not the latest released one).

I see that nfs-utils-1.3.0-0.21.el7_2 is pending to be released. But considering this bug (which is also surfacing in bug 1251827), I'd suggest that nfs-utils-1.3.0-0.22.el7_2 will be shipped instead if possible.

Comment 10 Fabian Deutsch 2016-02-09 09:31:30 UTC
Some corrections:

The build with the bug and currently released is: nfs-utils-1.3.0-0.21.el7.x86_64

Currently staged is: nfs-utils-1.3.0-0.21.el7_2

According to bug 1251827 this build fixes the issue: nfs-utils-1.3.0-0.22.el7

The question is now if nfs-utils-1.3.0-0.21.el7_2 will also fix the issue.

Comment 11 Fabian Deutsch 2016-02-09 15:15:54 UTC
According to bug 1251827 comment 28 this bug does not reproduce reliably. Thus reducing the priority for now.

Comment 18 Steve Dickson 2016-04-27 18:27:27 UTC
I wonder if this could help:

commit 31ca7d4f6aaa799fce013ea1d6ab3a44bf4baa9e
Author: NeilBrown <neilb@suse.com>
Date:   Wed Apr 27 13:06:55 2016 -0400

    mount: run START_STATD fully as root
    
    If a "user" mount is the first NFSv3 mount, mount.nfs will be running
    setuid to root (with non-root as the real-uid) when it executes
    START_STATD.
    
    start-statd is a shell script and many shells refuse to run setuid,
    dropping privileges immediately.  This results in start-statd running
    as an unprivileged user and so statd fails to start.
    
    To fix this, call "setuid(0)" to set real uid to zero.  Also call
    "setgid(0)"
    for consistency.
    
    The behaviour of a shell can often be affected by the environment,
    such as the "shell functions" that bash includes from the environment.
    To avoid the user being able to pass such environment to the shell,
    explicitly pass an empty environment.  The start-statd script explicitly
    sets the PATH which is all it really needs.
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    Signed-off-by: Steve Dickson <steved@redhat.com>

Comment 19 Elad 2016-04-28 10:50:48 UTC
Steve, is the info asked in https://bugzilla.redhat.com/show_bug.cgi?id=1275082#c17 still required?

Comment 20 Steve Dickson 2016-04-29 13:13:47 UTC
(In reply to Elad from comment #19)
> Steve, is the info asked in
> https://bugzilla.redhat.com/show_bug.cgi?id=1275082#c17 still required?

Yes... I would like to know if there are any error messages as
to why statd is not being started or is failing...

Comment 21 Elad 2016-05-01 08:52:34 UTC
Hi Steve, 

/var/log/messages from the time of the bug occurrence is provided in the attachment.

These are the relevant events from /var/log/messages :


Oct 25 18:19:55 green-vdsc systemd: Started NFS status monitor for NFSv2/3 locking..
Oct 25 18:20:01 green-vdsc systemd: Started Session 1069 of user root.
Oct 25 18:20:01 green-vdsc systemd: Starting Session 1069 of user root.
Oct 25 18:20:01 green-vdsc systemd: Started NFS status monitor for NFSv2/3 locking..

Comment 22 Steve Dickson 2016-05-03 10:26:24 UTC
(In reply to Elad from comment #21)
> Hi Steve, 
> 
> /var/log/messages from the time of the bug occurrence is provided in the
> attachment.
> 
> These are the relevant events from /var/log/messages :
> 
> 
> Oct 25 18:19:55 green-vdsc systemd: Started NFS status monitor for NFSv2/3
> locking..
> Oct 25 18:20:01 green-vdsc systemd: Started Session 1069 of user root.
> Oct 25 18:20:01 green-vdsc systemd: Starting Session 1069 of user root.
> Oct 25 18:20:01 green-vdsc systemd: Started NFS status monitor for NFSv2/3
> locking..

Hmm... this is saying statd is being started successfully...

Comment 24 Steve Dickson 2016-05-16 13:32:09 UTC
Adding this upstream commit as well

commit 37cd45cb913403b9f3b0c2aaa705e06cd70cc1d7
Author: NeilBrown <neilb@suse.com>
Date:   Sat Jan 16 12:06:32 2016 -0500

    mount.nfs: trust the exit status of "start_statd".

Comment 26 Yongcheng Yang 2016-07-05 04:21:38 UTC
Have checked the patches mentioned in comment 18 and comment 24 are both merged into nfs-utils-1.3.0-0.26.el7

Hi Elad, could you please help to verify this issue solved or not in the latest nfs-utils version. As we QE don't have the test environment of storage pool. Thanks in advance.

Comment 27 Elad 2016-07-10 10:57:13 UTC
Hi Yongcheng, the nfs-utils the latest vdsm requires is nfs-utils-1.3.0-0.21.el7_2.1.x86_64 . Does it contain the fix?

Comment 28 Yongcheng Yang 2016-07-11 01:48:24 UTC
(In reply to Elad from comment #27)
> Hi Yongcheng, the nfs-utils the latest vdsm requires is
> nfs-utils-1.3.0-0.21.el7_2.1.x86_64 . Does it contain the fix?

nfs-utils-1.3.0-0.21.el7_2.1 is to fix Bug 1309625.

I'm afraid not for this bug's issue.
nfs-utils-1.3.0-0.26.el7 is the deemed version, maybe we can wait for that version or versions after that.

Comment 30 Elad 2016-09-07 10:35:11 UTC
I'm afraid not:

[root@blond-vdsf ~]# yum deplist vdsm-4.18.12-1.el7ev.x86_64 |grep nfs
  dependency: nfs-utils
   provider: nfs-utils.x86_64 1:1.3.0-0.21.el7_2.1

Comment 34 Elad 2016-09-13 10:17:30 UTC
Using the following [1], I wans't able to reproduce.
NFS mount succeeds after the connectivity to the storage is resumed.

[1]:
RHEL7.3
nfs-utils-1.3.0-0.32.el7.x86_64
vdsm-4.18.13-1.el7ev.x86_64

Comment 35 Yongcheng Yang 2016-09-14 02:30:35 UTC
(In reply to Elad from comment #34)
> Using the following [1], I wans't able to reproduce.
> NFS mount succeeds after the connectivity to the storage is resumed.
> 
> [1]:
> RHEL7.3
> nfs-utils-1.3.0-0.32.el7.x86_64
> vdsm-4.18.13-1.el7ev.x86_64

Thanks a lot for your verification.

Also have checked the patch of comment 26 already merged.

Move to VERIFIED with SanityOnly now.

Comment 37 errata-xmlrpc 2016-11-04 05:00:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2383.html


Note You need to log in before you can comment on or make changes to this bug.