1466007 – [Ganesha] : Unable to bring up a Ganesha HA cluster on latest bits.

Bug 1466007 - [Ganesha] : Unable to bring up a Ganesha HA cluster on latest bits.

Summary: [Ganesha] : Unable to bring up a Ganesha HA cluster on latest bits.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Jiffin
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1417151
TreeView+	depends on / blocked

Reported:	2017-06-28 17:00 UTC by Ambarish
Modified:	2017-09-21 04:47 UTC (History)
CC List:	12 users (show)
Fixed In Version:	nfs-ganesha-2.4.4-16
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-21 04:47:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Remove dependency on glusterfssharedstorage for ganesha (1.16 KB, patch) 2017-07-12 13:37 UTC, Jiffin	no flags	Details \| Diff
Second patch , first patch miissed some of my changes (1.25 KB, patch) 2017-07-13 05:52 UTC, Jiffin	no flags	Details \| Diff
Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch (1.69 KB, patch) 2017-07-17 12:06 UTC, Jiffin	no flags	Details \| Diff
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:2779	0	normal	SHIPPED_LIVE	nfs-ganesha bug fix and enhancement update	2017-09-21 08:17:17 UTC

Description Ambarish 2017-06-28 17:00:58 UTC

Description of problem:
----------------------

A regression seems to have been introduced in latest Gluster/Ganesha bits wherein I can't bring up a Ganesha HA cluster on my nodes.


*CLI Output* :

[root@gqas013 /]# gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
 (y/n) y
This will take a few minutes to complete. Please wait ..
nfs-ganesha: failed: NFS-Ganesha failed to start.Please see log file for details
[root@gqas013 /]# 


There is no ganesha*.log unfortunately.

**From the master node,where I was enabling Ganesha from** :

[2017-06-28 15:10:05.054673] E [MSGID: 106469] [glusterd-ganesha.c:305:glusterd_op_stage_set_ganesha] 0-management: Could not start NFS-Ganesha
[2017-06-28 15:10:05.054731] E [MSGID: 106301] [glusterd-syncop.c:1315:gd_stage_op_phase] 0-management: Staging of operation 'Volume (null)' failed on localhost : NFS-Ganesha failed to start.Please see log file for details
~                        


**On other nodes** :

[2017-06-28 15:09:26.337829] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd530a) [0x7f7a1f5b230a] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd4dc5) [0x7f7a1f5b1dc5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7a2aa9f545] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S31ganesha-start.sh --volname=gluster_shared_storage --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-06-28 15:10:04.968212] E [MSGID: 106062] [glusterd-op-sm.c:4034:glusterd_op_ac_lock] 0-management: Unable to acquire volname
[2017-06-28 15:10:05.055274] E [MSGID: 106062] [glusterd-op-sm.c:4097:glusterd_op_ac_unlock] 0-management: Unable to acquire volname



This works fine :

nfs-ganesha-gluster-2.4.4-8.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64

This doesn't work :

[root@gqas013 Ansible]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
nfs-ganesha-2.4.4-11.el7rhgs.x86_64


The underlying OS is same in both the cases - 7.4.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64
nfs-ganesha-2.4.4-11.el7rhgs.x86_64


How reproducible:
-----------------

Every which way I try.

Comment 2 Ambarish 2017-06-28 17:02:28 UTC

This is what I see in messages when I try to enable Ganesha :

Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha configuration...
Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount: Directory /var/run/gluster/shared_storage to mount over is not empty, mounting anyway.
Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage...
Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab, GlusterFS is already mounted on /run/gluster/shared_storage
Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration.
Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount process exited, code=exited status=32
Jun 28 13:00:08 gqas013 systemd: Failed to mount /var/run/gluster/shared_storage.
Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file server.
Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with result 'dependency'.
Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount entered failed state.

Comment 4 Ambarish 2017-06-28 17:06:17 UTC

(In reply to Ambarish from comment #2)
> This is what I see in messages when I try to enable Ganesha :
> 
> Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha
> configuration...
> Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount:
> Directory /var/run/gluster/shared_storage to mount over is not empty,
> mounting anyway.
> Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage...
> Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab,
> GlusterFS is already mounted on /run/gluster/shared_storage
> Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration.
> Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount
> process exited, code=exited status=32
> Jun 28 13:00:08 gqas013 systemd: Failed to mount
> /var/run/gluster/shared_storage.
> Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file
> server.
> Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with
> result 'dependency'.
> Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount
> entered failed state.

So is mounting SS failing on latest bits?

Comment 5 Ambarish 2017-06-28 17:08:01 UTC

(In reply to Ambarish from comment #4)
> (In reply to Ambarish from comment #2)
> > This is what I see in messages when I try to enable Ganesha :
> > 
> > Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha
> > configuration...
> > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount:
> > Directory /var/run/gluster/shared_storage to mount over is not empty,
> > mounting anyway.
> > Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage...
> > Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab,
> > GlusterFS is already mounted on /run/gluster/shared_storage
> > Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration.
> > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount
> > process exited, code=exited status=32
> > Jun 28 13:00:08 gqas013 systemd: Failed to mount
> > /var/run/gluster/shared_storage.
> > Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file
> > server.
> > Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with
> > result 'dependency'.
> > Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount
> > entered failed state.
> 
> So is mounting SS failing on latest bits?

[root@gqas013 ganesha]# mount |grep shared
gqas013.sbu.lab.eng.bos.redhat.com:/gluster_shared_storage on /run/gluster/shared_storage type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@gqas013 ganesha]#

Comment 7 Jiffin 2017-06-29 07:40:18 UTC

RCA :
We had add dependency in nfs-ganesha system file for var-run-gluster-shared_storage.mount . This is service created by systemd automatically based on /etc/fstab entry during reboot. But when enable shared storage for the first time, there is no such service. So nfs-ganesha will fail to start of because of that dependency

Workaround:
For the time being remove that dependency from nfs-ganesha.service file
edit following file /usr/lib/systemd/system/nfs-ganesha.service
remove dependency on var-run-gluster-shared_storage.mount from Requires(remove this line completely) and After

Comment 9 Soumya Koduri 2017-06-29 11:42:08 UTC

(In reply to Jiffin from comment #7)
> RCA :
> We had add dependency in nfs-ganesha system file for
> var-run-gluster-shared_storage.mount . This is service created by systemd
> automatically based on /etc/fstab entry during reboot. But when enable
> shared storage for the first time, there is no such service. So nfs-ganesha
> will fail to start of because of that dependency
> 
> Workaround:
> For the time being remove that dependency from nfs-ganesha.service file
> edit following file /usr/lib/systemd/system/nfs-ganesha.service
> remove dependency on var-run-gluster-shared_storage.mount from
> Requires(remove this line completely) and After

Other workaround which could be applied IMO is 
#umount /run/gluster/shared_storage
#systemctl restart var-run-gluster-shared_storage.mount (make sure status of this service is SUCCESS)
#then go ahead with the cluster setup.

Comment 15 Jiffin 2017-07-12 13:37:13 UTC

Created attachment 1297015 [details]
Remove dependency on glusterfssharedstorage for ganesha

Comment 16 Jiffin 2017-07-13 05:52:54 UTC

Created attachment 1297425 [details]
Second patch , first patch miissed some of my changes

Comment 17 Jiffin 2017-07-17 12:06:08 UTC

Created attachment 1299823 [details]
Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch

Comment 18 Ambarish 2017-07-19 17:35:10 UTC

Works fine on nfs-ganesha-2.4.4-16.

I am able to bring up a Ganesha cluster,disable Ganesha and re-enable again.

Moving to Verified.

Comment 20 errata-xmlrpc 2017-09-21 04:47:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2779

Note You need to log in before you can comment on or make changes to this bug.