Bug 1324064

Summary: Missing ganesha folder from shared storage after reboot of few nodes.
Product: Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Soumya Koduri <skoduri>
Status: CLOSED WORKSFORME QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: jthottan, kkeithle, ndevos, nlevinki, sashinde, skoduri, sraj
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-28 13:31:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shashank Raj 2016-04-05 12:46:57 UTC
Description of problem:
Missing ganesha folder from shared storage after reboot of few nodes.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-1

How reproducible:
once

Steps to Reproduce:
1.Configure ganesha on a 4 node gluster.
2.Create a volume and enable ganesha on the volume.
3.Take down 2 of the nodes and bring it back after sometime.
4.Make sure after the nodes comes back, shared volume is mounted on all the nodes.
5.Start pcs, pacemaker and nfs-ganesha service on the nodes which came up.
5.Observe that from the shared volume, nfs-ganesha folder got missing and because of which statd service failed on the rebooted node with below messages in logs

Apr  5 07:13:30 dhcp37-127 systemd: Starting NFS status monitor for NFSv2/3 locking....
Apr  5 07:13:30 dhcp37-127 rpc.statd[31060]: Version 1.3.0 starting
Apr  5 07:13:30 dhcp37-127 rpc.statd[31060]: Flags: TI-RPC
Apr  5 07:13:30 dhcp37-127 rpc.statd[31060]: Failed to open directory sm: No such file or directory
Apr  5 07:13:30 dhcp37-127 rpc.statd[31060]: Initializing NSM state
Apr  5 07:13:30 dhcp37-127 rpc.statd[31060]: Failed to create /var/lib/nfs/statd/state.new: No such file or directory
Apr  5 07:13:30 dhcp37-127 systemd: nfs-ganesha-lock.service: control process exited, code=exited status=1
Apr  5 07:13:30 dhcp37-127 systemd: Failed to start NFS status monitor for NFSv2/3 locking..
Apr  5 07:13:30 dhcp37-127 systemd: Unit nfs-ganesha-lock.service entered failed state.
Apr  5 07:13:30 dhcp37-127 systemd: nfs-ganesha-lock.service failed.


nfs-ganesha lock service status from the 2 nodes:

[root@dhcp37-127 ~]# service nfs-ganesha-lock status
Redirecting to /bin/systemctl status  nfs-ganesha-lock.service
● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2016-04-05 07:13:30 IST; 5min ago
  Process: 31059 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=1/FAILURE)

Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com rpc.statd[31060]: Version 1.3.0 starting
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com rpc.statd[31060]: Flags: TI-RPC
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com rpc.statd[31060]: Failed to open directory sm: No such file or directory
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com rpc.statd[31060]: Initializing NSM state
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com rpc.statd[31060]: Failed to create /var/lib/nfs/statd/state.new: No such file or directory
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha-lock.service: control process exited, code=exited status=1
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha-lock.service entered failed state.
Apr 05 07:13:30 dhcp37-127.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha-lock.service failed.


[root@dhcp37-174 ~]# service nfs-ganesha-lock status
Redirecting to /bin/systemctl status  nfs-ganesha-lock.service
● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2016-04-05 06:35:45 IST; 49min ago
  Process: 12973 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=1/FAILURE)

Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com rpc.statd[12974]: Version 1.3.0 starting
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com rpc.statd[12974]: Flags: TI-RPC
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com rpc.statd[12974]: Failed to open directory sm: No such file or directory
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com rpc.statd[12974]: Initializing NSM state
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com rpc.statd[12974]: Failed to create /var/lib/nfs/statd/state.new: No such file or directory
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha-lock.service: control process exited, code=exited status=1
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking..
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha-lock.service entered failed state.
Apr 05 06:35:45 dhcp37-174.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha-lock.service failed.


[root@dhcp37-127 ~]# cd /var/run/gluster/shared_storage/
[root@dhcp37-127 shared_storage]# ls
[root@dhcp37-127 shared_storage]# pwd
/var/run/gluster/shared_storage


Actual results:

Missing ganesha folder from shared storage after reboot of few nodes.

Expected results:

nfs-ganesha should not get deleted from the shared volume

Additional info:

Comment 1 Shashank Raj 2016-04-05 12:54:55 UTC
sosreports are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1324064

Comment 2 Soumya Koduri 2016-04-11 10:23:03 UTC
I am unable to reproduce the issue on our cluster. Could you please reproduce the issue and provide us the setup. Also before reboot of any node, please verify that '/var/lib/nfs' is symbolic link to the right location under gluster_shared_storage volumes, on all the nodes.

Comment 3 Shashank Raj 2016-04-25 11:17:56 UTC
Haven't seen this issue with latest ganesha builds, will keep an eye on this and update bug accordingly.

Comment 4 Soumya Koduri 2016-04-28 13:31:24 UTC
Based on the comments above, closing this bug. Please re-open if the issue still exists.