Bug 1466007

Summary: [Ganesha] : Unable to bring up a Ganesha HA cluster on latest bits.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Jiffin <jthottan>
Status: CLOSED ERRATA QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-2.4.4-16 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-21 04:47:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1417151    
Attachments:
Description Flags
Remove dependency on glusterfssharedstorage for ganesha
none
Second patch , first patch miissed some of my changes
none
Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch none

Description Ambarish 2017-06-28 17:00:58 UTC
Description of problem:
----------------------

A regression seems to have been introduced in latest Gluster/Ganesha bits wherein I can't bring up a Ganesha HA cluster on my nodes.


*CLI Output* :

[root@gqas013 /]# gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed: NFS-Ganesha failed to start.Please see log file for details
[root@gqas013 /]# 


There is no ganesha*.log unfortunately.

**From the master node,where I was enabling Ganesha from** :

[2017-06-28 15:10:05.054673] E [MSGID: 106469] [glusterd-ganesha.c:305:glusterd_op_stage_set_ganesha] 0-management: Could not start NFS-Ganesha
[2017-06-28 15:10:05.054731] E [MSGID: 106301] [glusterd-syncop.c:1315:gd_stage_op_phase] 0-management: Staging of operation 'Volume (null)' failed on localhost : NFS-Ganesha failed to start.Please see log file for details
~                        


**On other nodes** :

[2017-06-28 15:09:26.337829] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd530a) [0x7f7a1f5b230a] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd4dc5) [0x7f7a1f5b1dc5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7a2aa9f545] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S31ganesha-start.sh --volname=gluster_shared_storage --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-06-28 15:10:04.968212] E [MSGID: 106062] [glusterd-op-sm.c:4034:glusterd_op_ac_lock] 0-management: Unable to acquire volname
[2017-06-28 15:10:05.055274] E [MSGID: 106062] [glusterd-op-sm.c:4097:glusterd_op_ac_unlock] 0-management: Unable to acquire volname



This works fine :

nfs-ganesha-gluster-2.4.4-8.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64

This doesn't work :

[root@gqas013 Ansible]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
nfs-ganesha-2.4.4-11.el7rhgs.x86_64


The underlying OS is same in both the cases - 7.4.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
nfs-ganesha-2.4.4-11.el7rhgs.x86_64


How reproducible:
-----------------

Every which way I try.

Comment 2 Ambarish 2017-06-28 17:02:28 UTC
This is what I see in messages when I try to enable Ganesha :

Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha configuration...
Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount: Directory /var/run/gluster/shared_storage to mount over is not empty, mounting anyway.
Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage...
Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab, GlusterFS is already mounted on /run/gluster/shared_storage
Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration.
Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount process exited, code=exited status=32
Jun 28 13:00:08 gqas013 systemd: Failed to mount /var/run/gluster/shared_storage.
Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file server.
Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with result 'dependency'.
Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount entered failed state.

Comment 4 Ambarish 2017-06-28 17:06:17 UTC
(In reply to Ambarish from comment #2)
> This is what I see in messages when I try to enable Ganesha :
> 
> Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha
> configuration...
> Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount:
> Directory /var/run/gluster/shared_storage to mount over is not empty,
> mounting anyway.
> Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage...
> Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab,
> GlusterFS is already mounted on /run/gluster/shared_storage
> Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration.
> Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount
> process exited, code=exited status=32
> Jun 28 13:00:08 gqas013 systemd: Failed to mount
> /var/run/gluster/shared_storage.
> Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file
> server.
> Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with
> result 'dependency'.
> Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount
> entered failed state.

So is mounting SS failing on latest bits?

Comment 5 Ambarish 2017-06-28 17:08:01 UTC
(In reply to Ambarish from comment #4)
> (In reply to Ambarish from comment #2)
> > This is what I see in messages when I try to enable Ganesha :
> > 
> > Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha
> > configuration...
> > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount:
> > Directory /var/run/gluster/shared_storage to mount over is not empty,
> > mounting anyway.
> > Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage...
> > Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab,
> > GlusterFS is already mounted on /run/gluster/shared_storage
> > Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration.
> > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount
> > process exited, code=exited status=32
> > Jun 28 13:00:08 gqas013 systemd: Failed to mount
> > /var/run/gluster/shared_storage.
> > Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file
> > server.
> > Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with
> > result 'dependency'.
> > Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount
> > entered failed state.
> 
> So is mounting SS failing on latest bits?

[root@gqas013 ganesha]# mount |grep shared
gqas013.sbu.lab.eng.bos.redhat.com:/gluster_shared_storage on /run/gluster/shared_storage type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@gqas013 ganesha]#

Comment 7 Jiffin 2017-06-29 07:40:18 UTC
RCA :
We had add dependency in nfs-ganesha system file for var-run-gluster-shared_storage.mount . This is service created by systemd automatically based on /etc/fstab entry during reboot. But when enable shared storage for the first time, there is no such service. So nfs-ganesha will fail to start of because of that dependency

Workaround:
For the time being remove that dependency from nfs-ganesha.service file
edit following file /usr/lib/systemd/system/nfs-ganesha.service
remove dependency on var-run-gluster-shared_storage.mount from Requires(remove this line completely) and After

Comment 9 Soumya Koduri 2017-06-29 11:42:08 UTC
(In reply to Jiffin from comment #7)
> RCA :
> We had add dependency in nfs-ganesha system file for
> var-run-gluster-shared_storage.mount . This is service created by systemd
> automatically based on /etc/fstab entry during reboot. But when enable
> shared storage for the first time, there is no such service. So nfs-ganesha
> will fail to start of because of that dependency
> 
> Workaround:
> For the time being remove that dependency from nfs-ganesha.service file
> edit following file /usr/lib/systemd/system/nfs-ganesha.service
> remove dependency on var-run-gluster-shared_storage.mount from
> Requires(remove this line completely) and After

Other workaround which could be applied IMO is 
#umount /run/gluster/shared_storage
#systemctl restart var-run-gluster-shared_storage.mount (make sure status of this service is SUCCESS)
#then go ahead with the cluster setup.

Comment 15 Jiffin 2017-07-12 13:37:13 UTC
Created attachment 1297015 [details]
Remove dependency on glusterfssharedstorage for ganesha

Comment 16 Jiffin 2017-07-13 05:52:54 UTC
Created attachment 1297425 [details]
Second patch , first patch miissed some of my changes

Comment 17 Jiffin 2017-07-17 12:06:08 UTC
Created attachment 1299823 [details]
Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch

Comment 18 Ambarish 2017-07-19 17:35:10 UTC
Works fine on nfs-ganesha-2.4.4-16.

I am able to bring up a Ganesha cluster,disable Ganesha and re-enable again.

Moving to Verified.

Comment 20 errata-xmlrpc 2017-09-21 04:47:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2779