Bug 1121451

Summary: [start-statd] start-statd start fail, when "rpc.statd dead but pid file exists"
Product: Red Hat Enterprise Linux 6 Reporter: JianHong Yin <jiyin>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Yongcheng Yang <yoyang>
Severity: high Docs Contact:
Priority: high    
Version: 6.6CC: eguan, fs-qe, jiyin, steved, swhiteho, yoyang
Target Milestone: rcKeywords: Reopened, TestBlocker
Target Release: ---Flags: jiyin: needinfo-
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-21 20:54:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1164899    

Description JianHong Yin 2014-07-21 03:09:32 UTC
Description of problem:
[10:41:23 root@ ~~]# service nfslock status
rpc.statd dead but pid file exists
--------------------------------------------------------------------------------
[10:41:23 root@ ~~]# mount -t nfs -o vers=3 127.0.0.1:/tmp /bz712438
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
:: [   FAIL   ] :: Mounting with nfsv3 (Expected 0, got 32)

Version-Release number of selected component (if applicable):
RHEL-6.x

How reproducible:
always

Steps to Reproduce:
1. kill rpc.statd, and the status is "rpc.statd dead but pid file exists"
2. mount -t nfs -o vers=3  $serv:$expdir  $nfsmp
3.

Actual results:
[10:41:23 root@ ~~]# mount -t nfs -o vers=3 127.0.0.1:/tmp /bz712438
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
:: [   FAIL   ] :: Mounting with nfsv3 (Expected 0, got 32)

Expected results:
mount success

Additional info:

Comment 4 JianHong Yin 2014-07-22 07:31:38 UTC
Got the reason: Jul 22 03:06:21 ibm-z10-67 rpc.statd[42160]: Opening /var/run/rpc.statd.pid failed: Permission denied

but do not known why Permission denied, and how to reproduce step by step,,

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: /CoreOS/nfs-utils/Regression/bz712438-nfsstat-m-does-not-display-nfs4-mounts/Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

[03:06:21 root@ ~~]# service nfslock status
rpc.statd dead but pid file exists
--------------------------------------------------------------------------------
[03:06:21 root@ ~~]# rpcinfo -s
   program version(s) netid(s)                         service     owner
    100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
    100011  2,1       tcp,udp                          rquotad     superuser
    100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
    100003  4,3,2     udp6,tcp6,udp,tcp                nfs         superuser
    100227  3,2       udp6,tcp6,udp,tcp                nfs_acl     superuser
    100021  4,3,1     tcp6,udp6,tcp,udp                nlockmgr    superuser
--------------------------------------------------------------------------------
[03:06:21 root@ ~~]# mount -t nfs -o vers=3 127.0.0.1:/tmp /bz712438
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
:: [   FAIL   ] :: Mounting with nfsv3 (Expected 0, got 32)
--------------------------------------------------------------------------------
[03:06:21 root@ ~~]# tail /var/log/messages
Jul 22 03:06:01 ibm-z10-67 logger: /usr/bin/rhts-test-runner.sh report_finish 
Jul 22 03:06:17 ibm-z10-67 logger: /usr/bin/rhts-test-runner.sh rhts-test-update 127.0.0.1:7092 22968827 start  nfs-utils-CoreOS-nfs-utils-Regression-bz712438-nfsstat-m-does-not-display-nfs4-mounts
Jul 22 03:06:17 ibm-z10-67 python: 07/22/14 03:06:17  testID:22968827 start:
Jul 22 03:06:18 ibm-z10-67 rpc.mountd[41997]: Version 1.2.3 starting
Jul 22 03:06:18 ibm-z10-67 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Jul 22 03:06:18 ibm-z10-67 kernel: NFSD: starting 90-second grace period
Jul 22 03:06:21 ibm-z10-67 rpc.statd[42160]: Version 1.2.3 starting
Jul 22 03:06:21 ibm-z10-67 rpc.statd[42160]: Flags: TI-RPC 
Jul 22 03:06:21 ibm-z10-67 rpc.statd[42160]: Opening /var/run/rpc.statd.pid failed: Permission denied

Comment 5 Steve Dickson 2014-08-07 14:27:16 UTC
How do I get access to these tests? 

/CoreOS/nfs-utils/Regression/...

Comment 6 JianHong Yin 2014-08-08 01:52:55 UTC
(In reply to Steve Dickson from comment #5)
> How do I get access to these tests? 
> 
> /CoreOS/nfs-utils/Regression/...

I am investigating the real reproduce condition..

Comment 8 JianHong Yin 2014-08-10 11:21:55 UTC
Now the reproducer /CoreOS/nfs-utils/Regression/bz1121451-start-statd-test can work fine.


After follow step, will reproduce the issue

[07:15:06 root@ ~~]# rpc.statd; sleep 0.5; rpc.mountd
--------------------------------------------------------------------------------
[07:15:07 root@ ~~]# rpcinfos
   program version(s) netid(s)                         service     owner
    100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
    100024  1         tcp6,udp6,tcp,udp                status      29
    100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
:: [   PASS   ] :: Running 'rpcinfos' (Expected 0, got 0)
--------------------------------------------------------------------------------
[07:15:07 root@ ~~]# killall rpc.statd
--------------------------------------------------------------------------------
[07:15:07 root@ ~~]# killall rpc.mountd


mount.nfs  will  fail  with  follow  msg:
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified

Comment 14 Steve Dickson 2015-01-02 21:57:44 UTC
(In reply to Yin.JianHong from comment #0)
> Description of problem:
> [10:41:23 root@ ~~]# service nfslock status
> rpc.statd dead but pid file exists
> -----------------------------------------------------------------------------
> ---
> [10:41:23 root@ ~~]# mount -t nfs -o vers=3 127.0.0.1:/tmp /bz712438
> mount.nfs: rpc.statd is not running but is required for remote locking.
> mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
> mount.nfs: an incorrect mount option was specified
> :: [   FAIL   ] :: Mounting with nfsv3 (Expected 0, got 32)
> 
> Version-Release number of selected component (if applicable):
> RHEL-6.x
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. kill rpc.statd, and the status is "rpc.statd dead but pid file exists"
> 2. mount -t nfs -o vers=3  $serv:$expdir  $nfsmp
> 3.
> 
> Actual results:
> [10:41:23 root@ ~~]# mount -t nfs -o vers=3 127.0.0.1:/tmp /bz712438
> mount.nfs: rpc.statd is not running but is required for remote locking.
> mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
> mount.nfs: an incorrect mount option was specified
> :: [   FAIL   ] :: Mounting with nfsv3 (Expected 0, got 32)

This seems to work with the latest nfs-utils

 rhel66# service nfslock start
Starting NFS statd:                                        [  OK  ]
rhel66# ps ax | grep statd
 3844 ?        Ss     0:00 rpc.statd
 3849 pts/1    S+     0:00 grep statd
rhel66# kill 3844
rhel66# service nfslock status
rpc.statd dead but pid file exists
rhel66# mount redhat:/home /mnt/home -o v3
rhel66# ps ax | grep statd
 3868 ?        Ss     0:00 rpc.statd --no-notify
 3870 pts/1    S+     0:00 grep statd

Please check your script...

Comment 16 Yongcheng Yang 2015-03-23 09:23:18 UTC
This has not been fixed in the latest RHEL-6
(In reply to Steve Dickson from comment #14)
> (In reply to Yin.JianHong from comment #0)
> > Description of problem:
> > [10:41:23 root@ ~~]# service nfslock status
> > rpc.statd dead but pid file exists
> > -----------------------------------------------------------------------------
> > ---
> > [10:41:23 root@ ~~]# mount -t nfs -o vers=3 127.0.0.1:/tmp /bz712438
> > mount.nfs: rpc.statd is not running but is required for remote locking.
> > mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
> > mount.nfs: an incorrect mount option was specified
> > :: [   FAIL   ] :: Mounting with nfsv3 (Expected 0, got 32)
> > 
> > Version-Release number of selected component (if applicable):
> > RHEL-6.x
> > 
> > How reproducible:
> > always
> > 
> > Steps to Reproduce:
> > 1. kill rpc.statd, and the status is "rpc.statd dead but pid file exists"
> > 2. mount -t nfs -o vers=3  $serv:$expdir  $nfsmp
> > 3.
> > 
> > Actual results:
> > [10:41:23 root@ ~~]# mount -t nfs -o vers=3 127.0.0.1:/tmp /bz712438
> > mount.nfs: rpc.statd is not running but is required for remote locking.
> > mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
> > mount.nfs: an incorrect mount option was specified
> > :: [   FAIL   ] :: Mounting with nfsv3 (Expected 0, got 32)
> 
> This seems to work with the latest nfs-utils
> 
>  rhel66# service nfslock start
> Starting NFS statd:                                        [  OK  ]
> rhel66# ps ax | grep statd
>  3844 ?        Ss     0:00 rpc.statd
>  3849 pts/1    S+     0:00 grep statd
> rhel66# kill 3844
> rhel66# service nfslock status
> rpc.statd dead but pid file exists
> rhel66# mount redhat:/home /mnt/home -o v3
> rhel66# ps ax | grep statd
>  3868 ?        Ss     0:00 rpc.statd --no-notify
>  3870 pts/1    S+     0:00 grep statd
> 
> Please check your script...
Need to kill both rpc.statd and rpc.mountd

[root@hp-dl385pg8-10 ~]# service nfslock restart
Stopping NFS locking: [  OK  ]
Stopping NFS statd: [  OK  ]
Starting NFS statd: [  OK  ]
[root@hp-dl385pg8-10 ~]# ps ax | grep statd
24738 ?        Ss     0:00 rpc.statd
24742 pts/2    S+     0:00 grep statd
[root@hp-dl385pg8-10 ~]# kill 24738
[root@hp-dl385pg8-10 ~]# ps ax | grep mountd
24634 ?        Ss     0:00 rpc.mountd
24744 pts/2    S+     0:00 grep mountd
[root@hp-dl385pg8-10 ~]# kill 24634
[root@hp-dl385pg8-10 ~]# service nfslock status
rpc.statd dead but pid file exists
[root@hp-dl385pg8-10 ~]# mount -t nfs -o vers=3 127.0.0.1:/export/ /mnt/mnt_test/
mount.nfs: requested NFS version or transport protocol is not supported
[root@hp-dl385pg8-10 ~]# echo $?
32
[root@hp-dl385pg8-10 ~]# ps ax | grep statd
24768 ?        Ss     0:00 rpc.statd --no-notify
24770 pts/2    S+     0:00 grep statd
[root@hp-dl385pg8-10 ~]# ps ax | grep mountd
24772 pts/2    S+     0:00 grep mountd
[root@hp-dl385pg8-10 ~]# mount -vvv -t nfs -o vers=3 127.0.0.1:/export/ /mnt/mnt_test/
mount: fstab path: "/etc/fstab"
mount: mtab path:  "/etc/mtab"
mount: lock path:  "/etc/mtab~"
mount: temp path:  "/etc/mtab.tmp"
mount: UID:        0
mount: eUID:       0
mount: spec:  "127.0.0.1:/export/"
mount: node:  "/mnt/mnt_test/"
mount: types: "nfs"
mount: opts:  "vers=3"
final mount options: 'vers=3'
mount: external mount: argv[0] = "/sbin/mount.nfs"
mount: external mount: argv[1] = "127.0.0.1:/export/"
mount: external mount: argv[2] = "/mnt/mnt_test/"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw,vers=3"
mount.nfs: timeout set for Mon Mar 23 17:21:59 2015
mount.nfs: trying text-based options 'vers=3,addr=127.0.0.1'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 127.0.0.1 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: requested NFS version or transport protocol is not supported
[root@hp-dl385pg8-10 ~]#

Comment 17 Steve Dickson 2015-03-23 12:21:16 UTC
(In reply to Yongcheng Yang from comment #16)> > > Description of problem:
> > Please check your script...
> Need to kill both rpc.statd and rpc.mountd
> [root@hp-dl385pg8-10 ~]# mount -t nfs -o vers=3 127.0.0.1:/export/
> /mnt/mnt_test/
> mount.nfs: requested NFS version or transport protocol is not supported
Its is expected behaviour. Killing rpc.statd and rpc.mountd is
going to cause a loopback mount to fail. I'm not sure what the 
expectations are but there is no problem here...

Comment 18 Yongcheng Yang 2015-03-24 02:44:24 UTC
(In reply to Steve Dickson from comment #17)
> (In reply to Yongcheng Yang from comment #16)> > > Description of problem:
> > > Please check your script...
> > Need to kill both rpc.statd and rpc.mountd
> > [root@hp-dl385pg8-10 ~]# mount -t nfs -o vers=3 127.0.0.1:/export/
> > /mnt/mnt_test/
> > mount.nfs: requested NFS version or transport protocol is not supported
> Its is expected behaviour. Killing rpc.statd and rpc.mountd is
> going to cause a loopback mount to fail. I'm not sure what the 
> expectations are but there is no problem here...
Sorry I mistake the reproducer, need to restart the nfs service after killing both rpc.statd and rpc.mountd, then try mount, it should success but failed.

I edit a simple reproducer, and it can 100% reproduced in rhe6, and can't reproduced at rhel7:
[root@hp-dl385pg8-10 ~]# cat 11214512.sh 
service nfs stop > /dev/null
service nfslock stop > /dev/null
service rpcbind restart > /dev/null
rpc.statd
kill `ps ax | grep [r]pc.statd | awk '{print $1}'` 2>/dev/null
kill `ps ax | grep [r]pc.mountd | awk '{print $1}'` 2>/dev/null
service nfs restart > /dev/null
service nfslock status
mount -t nfs -o vers=3 127.0.0.1:/export/ /mnt/mnt_test/
ps ax | grep statd
umount /mnt/mnt_test/ 2>/dev/null
tail /var/log/messages
[root@hp-dl385pg8-10 ~]# 

Following is the reproduce logs by hand:
[root@hp-dl385pg8-10 ~]# service rpcbind restart
Stopping rpcbind: [  OK  ]
Starting rpcbind: [  OK  ]
[root@hp-dl385pg8-10 ~]# rpc.statd
[root@hp-dl385pg8-10 ~]# rpc.mountd 
[root@hp-dl385pg8-10 ~]# ps ax | grep rpc.statd
 5720 ?        Ss     0:00 rpc.statd
 5725 pts/2    S+     0:00 grep rpc.statd
[root@hp-dl385pg8-10 ~]# kill 5720
[root@hp-dl385pg8-10 ~]# ps ax | grep rpc.mountd
 5608 ?        Ss     0:00 rpc.mountd
 5723 ?        Ss     0:00 rpc.mountd
 5727 pts/2    S+     0:00 grep rpc.mountd
[root@hp-dl385pg8-10 ~]# kill 5608 5723
[root@hp-dl385pg8-10 ~]# service nfs restart
Shutting down NFS daemon: [  OK  ]
Shutting down NFS mountd: [FAILED]
Shutting down NFS quotas: [  OK  ]
Shutting down NFS services:  [  OK  ]
Shutting down RPC idmapd: [  OK  ]
Starting NFS services:  [  OK  ]
Starting NFS quotas: [  OK  ]
Starting NFS mountd: [  OK  ]
Starting NFS daemon: [  OK  ]
Starting RPC idmapd: [  OK  ]
[root@hp-dl385pg8-10 ~]# service nfslock status
rpc.statd dead but pid file exists
[root@hp-dl385pg8-10 ~]# mount -t nfs -o vers=3 127.0.0.1:/export/ /mnt/mnt_test/
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
[root@hp-dl385pg8-10 ~]# echo $?
32
[root@hp-dl385pg8-10 ~]# mount -vvv -t nfs -o vers=3 127.0.0.1:/export/ /mnt/mnt_test/
mount: fstab path: "/etc/fstab"
mount: mtab path:  "/etc/mtab"
mount: lock path:  "/etc/mtab~"
mount: temp path:  "/etc/mtab.tmp"
mount: UID:        0
mount: eUID:       0
mount: spec:  "127.0.0.1:/export/"
mount: node:  "/mnt/mnt_test/"
mount: types: "nfs"
mount: opts:  "vers=3"
final mount options: 'vers=3'
mount: external mount: argv[0] = "/sbin/mount.nfs"
mount: external mount: argv[1] = "127.0.0.1:/export/"
mount: external mount: argv[2] = "/mnt/mnt_test/"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw,vers=3"
mount.nfs: timeout set for Tue Mar 24 10:22:22 2015
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
[root@hp-dl385pg8-10 ~]# echo $?
32
[root@hp-dl385pg8-10 ~]# tail /var/log/messages
Mar 24 10:34:38 hp-dl385pg8-10 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
Mar 24 10:34:38 hp-dl385pg8-10 rpc.statd[10471]: Version 1.2.3 starting
Mar 24 10:34:38 hp-dl385pg8-10 sm-notify[10472]: Version 1.2.3 starting
Mar 24 10:34:38 hp-dl385pg8-10 rpc.statd[10471]: Caught signal 15, un-registering and exiting
Mar 24 10:34:39 hp-dl385pg8-10 rpc.mountd[10588]: Version 1.2.3 starting
Mar 24 10:34:39 hp-dl385pg8-10 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Mar 24 10:34:39 hp-dl385pg8-10 kernel: NFSD: starting 90-second grace period
Mar 24 10:34:39 hp-dl385pg8-10 rpc.statd[10661]: Version 1.2.3 starting
Mar 24 10:34:39 hp-dl385pg8-10 rpc.statd[10661]: Flags: TI-RPC 
Mar 24 10:34:39 hp-dl385pg8-10 rpc.statd[10661]: Opening /var/run/rpc.statd.pid failed: Permission denied