Bug 1275082 - [vdsm] NFS mount fails sometimes with "rpc.statd is not running but is required for remote locking"
[vdsm] NFS mount fails sometimes with "rpc.statd is not running but is requir...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils (Show other bugs)
7.2
x86_64 Unspecified
high Severity high
: pre-dev-freeze
: ---
Assigned To: Steve Dickson
Yongcheng Yang
storage
: OtherQA
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-25 12:36 EDT by Elad
Modified: 2016-11-04 01:00 EDT (History)
14 users (show)

See Also:
Fixed In Version: nfs-utils-1.3.0-0.26.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 01:00:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/ from host and engine.log (10.71 MB, application/x-gzip)
2015-10-25 12:36 EDT, Elad
no flags Details

  None (edit)
Description Elad 2015-10-25 12:36:51 EDT
Created attachment 1086258 [details]
/var/log/ from host and engine.log

Description of problem:
NFS mount fails with "rpc.statd is not running" error.

Reproduced while using nfs-utils-1.3.0-0.21.el7.x86_64.
On older nfs-utils versions (1.3.0-0.17 for example) the issue does not occur.


Version-Release number of selected component (if applicable):
vdsm-4.17.10-5.el7ev.noarch
nfs-utils-1.3.0-0.21.el7.x86_64

How reproducible:
Always


Steps to Reproduce:
occurred randomly on my hosted-engine setup, but I guess it could be reproduced with the following:
1. Host connected to storage pool with NFS domain.
2. Block connectivity to the NFS domain from the host using iptables.

Actual results:
Host cannot resume its connectivity to the storage server:

Thread-774::ERROR::2015-10-25 18:20:11,568::hsm::2465::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 419, in connect
    return self._mountCon.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 232, in connect
    six.reraise(t, v, tb)
  File "/usr/share/vdsm/storage/storageServer.py", line 224, in connect
    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/share/vdsm/storage/mount.py", line 225, in mount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 241, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (32, ";mount.nfs: rpc.statd is not running but is required for remote locking.\nmount.nfs: Either use '-o nolock' to keep locks local, or start statd.\nmount.nfs: an incorrect mount option was specified\n")

Host moves to 'Non operational'


Expected results:
NFS mount should succeed.

Additional info: 

[root@green-vdsc /]# systemctl -a |grep nfs
  proc-fs-nfsd.mount                                                                                                                           loaded    inactive dead      NFSD configuration filesystem
  var-lib-nfs-rpc_pipefs.mount                                                                                                                 loaded    inactive dead      RPC Pipe File System
  nfs-config.service                                                                                                                           loaded    active   exited    Preprocess NFS configuration
  nfs-idmapd.service                                                                                                                           loaded    inactive dead      NFSv4 ID-name mapping service
  nfs-mountd.service                                                                                                                           loaded    inactive dead      NFS Mount Daemon
  nfs-server.service                                                                                                                           loaded    inactive dead      NFS server and services
  nfs-utils.service                                                                                                                            loaded    inactive dead      NFS server and client services
  nfs-client.target                                                                                                                            loaded    inactive dead      NFS client services

[root@green-vdsc /]# systemctl -a |grep rpc
  var-lib-nfs-rpc_pipefs.mount                                                                                                                 loaded    inactive dead      RPC Pipe File System
  auth-rpcgss-module.service                                                                                                                   loaded    inactive dead      Kernel Module supporting RPCSEC_GSS
  rpc-gssd.service                                                                                                                             loaded    inactive dead      RPC security service for NFS client and server
  rpc-statd-notify.service                                                                                                                     loaded    inactive dead      Notify NFS peers of a restart
  rpc-statd.service                                                                                                                            loaded    active   running   NFS status monitor for NFSv2/3 locking.
  rpc-svcgssd.service                                                                                                                          loaded    inactive dead      RPC security service for NFS server
  rpcbind.service                                                                                                                              loaded    active   running   RPC bind service
  rpcbind.socket                                                                                                                               loaded    active   running   RPCbind Server Activation Socket
  rpcbind.target                                                                                                                               loaded    active   active    RPC Port Mapper
Comment 1 Meni Yakove 2015-10-25 12:44:30 EDT
Elad, I think that the rpc-statd is running when the mount fail and only after restart rpc-statd service vdsm can mount the NFS share.
Comment 2 Yaniv Kaul 2015-10-26 03:04:27 EDT
Do we suspect a vdsm issue or a platform issue with NFS utils pacakge?  If the latter,  please move the bug to platform.
Comment 3 Dan Kenigsberg 2015-10-26 03:29:42 EDT
and in particular - hasn't it worked OK on a former (el7.1) platform?
Comment 4 Elad 2015-10-26 08:55:16 EDT
(In reply to Dan Kenigsberg from comment #3)
> and in particular - hasn't it worked OK on a former (el7.1) platform?

Indeed, using nfs-utils-1.3.0-0.21.el7.x86_64 over RHEL7.1 we see the issue reproduced.
Comment 6 Steve Dickson 2015-10-26 12:11:41 EDT
Make sure you are using the latest version of rpcbind. That version
fixed a similar start up issue.
Comment 7 Fabian Deutsch 2016-02-09 02:35:29 EST
nfs-utils-1.3.0-0.21.el7_2 is the latest one (actualy not the latest released one).

I see that nfs-utils-1.3.0-0.21.el7_2 is pending to be released. But considering this bug (which is also surfacing in bug 1251827), I'd suggest that nfs-utils-1.3.0-0.22.el7_2 will be shipped instead if possible.
Comment 10 Fabian Deutsch 2016-02-09 04:31:30 EST
Some corrections:

The build with the bug and currently released is: nfs-utils-1.3.0-0.21.el7.x86_64

Currently staged is: nfs-utils-1.3.0-0.21.el7_2

According to bug 1251827 this build fixes the issue: nfs-utils-1.3.0-0.22.el7

The question is now if nfs-utils-1.3.0-0.21.el7_2 will also fix the issue.
Comment 11 Fabian Deutsch 2016-02-09 10:15:54 EST
According to bug 1251827 comment 28 this bug does not reproduce reliably. Thus reducing the priority for now.
Comment 18 Steve Dickson 2016-04-27 14:27:27 EDT
I wonder if this could help:

commit 31ca7d4f6aaa799fce013ea1d6ab3a44bf4baa9e
Author: NeilBrown <neilb@suse.com>
Date:   Wed Apr 27 13:06:55 2016 -0400

    mount: run START_STATD fully as root
    
    If a "user" mount is the first NFSv3 mount, mount.nfs will be running
    setuid to root (with non-root as the real-uid) when it executes
    START_STATD.
    
    start-statd is a shell script and many shells refuse to run setuid,
    dropping privileges immediately.  This results in start-statd running
    as an unprivileged user and so statd fails to start.
    
    To fix this, call "setuid(0)" to set real uid to zero.  Also call
    "setgid(0)"
    for consistency.
    
    The behaviour of a shell can often be affected by the environment,
    such as the "shell functions" that bash includes from the environment.
    To avoid the user being able to pass such environment to the shell,
    explicitly pass an empty environment.  The start-statd script explicitly
    sets the PATH which is all it really needs.
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    Signed-off-by: Steve Dickson <steved@redhat.com>
Comment 19 Elad 2016-04-28 06:50:48 EDT
Steve, is the info asked in https://bugzilla.redhat.com/show_bug.cgi?id=1275082#c17 still required?
Comment 20 Steve Dickson 2016-04-29 09:13:47 EDT
(In reply to Elad from comment #19)
> Steve, is the info asked in
> https://bugzilla.redhat.com/show_bug.cgi?id=1275082#c17 still required?

Yes... I would like to know if there are any error messages as
to why statd is not being started or is failing...
Comment 21 Elad 2016-05-01 04:52:34 EDT
Hi Steve, 

/var/log/messages from the time of the bug occurrence is provided in the attachment.

These are the relevant events from /var/log/messages :


Oct 25 18:19:55 green-vdsc systemd: Started NFS status monitor for NFSv2/3 locking..
Oct 25 18:20:01 green-vdsc systemd: Started Session 1069 of user root.
Oct 25 18:20:01 green-vdsc systemd: Starting Session 1069 of user root.
Oct 25 18:20:01 green-vdsc systemd: Started NFS status monitor for NFSv2/3 locking..
Comment 22 Steve Dickson 2016-05-03 06:26:24 EDT
(In reply to Elad from comment #21)
> Hi Steve, 
> 
> /var/log/messages from the time of the bug occurrence is provided in the
> attachment.
> 
> These are the relevant events from /var/log/messages :
> 
> 
> Oct 25 18:19:55 green-vdsc systemd: Started NFS status monitor for NFSv2/3
> locking..
> Oct 25 18:20:01 green-vdsc systemd: Started Session 1069 of user root.
> Oct 25 18:20:01 green-vdsc systemd: Starting Session 1069 of user root.
> Oct 25 18:20:01 green-vdsc systemd: Started NFS status monitor for NFSv2/3
> locking..

Hmm... this is saying statd is being started successfully...
Comment 24 Steve Dickson 2016-05-16 09:32:09 EDT
Adding this upstream commit as well

commit 37cd45cb913403b9f3b0c2aaa705e06cd70cc1d7
Author: NeilBrown <neilb@suse.com>
Date:   Sat Jan 16 12:06:32 2016 -0500

    mount.nfs: trust the exit status of "start_statd".
Comment 26 Yongcheng Yang 2016-07-05 00:21:38 EDT
Have checked the patches mentioned in comment 18 and comment 24 are both merged into nfs-utils-1.3.0-0.26.el7

Hi Elad, could you please help to verify this issue solved or not in the latest nfs-utils version. As we QE don't have the test environment of storage pool. Thanks in advance.
Comment 27 Elad 2016-07-10 06:57:13 EDT
Hi Yongcheng, the nfs-utils the latest vdsm requires is nfs-utils-1.3.0-0.21.el7_2.1.x86_64 . Does it contain the fix?
Comment 28 Yongcheng Yang 2016-07-10 21:48:24 EDT
(In reply to Elad from comment #27)
> Hi Yongcheng, the nfs-utils the latest vdsm requires is
> nfs-utils-1.3.0-0.21.el7_2.1.x86_64 . Does it contain the fix?

nfs-utils-1.3.0-0.21.el7_2.1 is to fix Bug 1309625.

I'm afraid not for this bug's issue.
nfs-utils-1.3.0-0.26.el7 is the deemed version, maybe we can wait for that version or versions after that.
Comment 30 Elad 2016-09-07 06:35:11 EDT
I'm afraid not:

[root@blond-vdsf ~]# yum deplist vdsm-4.18.12-1.el7ev.x86_64 |grep nfs
  dependency: nfs-utils
   provider: nfs-utils.x86_64 1:1.3.0-0.21.el7_2.1
Comment 34 Elad 2016-09-13 06:17:30 EDT
Using the following [1], I wans't able to reproduce.
NFS mount succeeds after the connectivity to the storage is resumed.

[1]:
RHEL7.3
nfs-utils-1.3.0-0.32.el7.x86_64
vdsm-4.18.13-1.el7ev.x86_64
Comment 35 Yongcheng Yang 2016-09-13 22:30:35 EDT
(In reply to Elad from comment #34)
> Using the following [1], I wans't able to reproduce.
> NFS mount succeeds after the connectivity to the storage is resumed.
> 
> [1]:
> RHEL7.3
> nfs-utils-1.3.0-0.32.el7.x86_64
> vdsm-4.18.13-1.el7ev.x86_64

Thanks a lot for your verification.

Also have checked the patch of comment 26 already merged.

Move to VERIFIED with SanityOnly now.
Comment 37 errata-xmlrpc 2016-11-04 01:00:54 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2383.html

Note You need to log in before you can comment on or make changes to this bug.