Description of problem: ---------------------- A regression seems to have been introduced in latest Gluster/Ganesha bits wherein I can't bring up a Ganesha HA cluster on my nodes. *CLI Output* : [root@gqas013 /]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y This will take a few minutes to complete. Please wait .. nfs-ganesha: failed: NFS-Ganesha failed to start.Please see log file for details [root@gqas013 /]# There is no ganesha*.log unfortunately. **From the master node,where I was enabling Ganesha from** : [2017-06-28 15:10:05.054673] E [MSGID: 106469] [glusterd-ganesha.c:305:glusterd_op_stage_set_ganesha] 0-management: Could not start NFS-Ganesha [2017-06-28 15:10:05.054731] E [MSGID: 106301] [glusterd-syncop.c:1315:gd_stage_op_phase] 0-management: Staging of operation 'Volume (null)' failed on localhost : NFS-Ganesha failed to start.Please see log file for details ~ **On other nodes** : [2017-06-28 15:09:26.337829] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd530a) [0x7f7a1f5b230a] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd4dc5) [0x7f7a1f5b1dc5] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f7a2aa9f545] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S31ganesha-start.sh --volname=gluster_shared_storage --first=no --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd [2017-06-28 15:10:04.968212] E [MSGID: 106062] [glusterd-op-sm.c:4034:glusterd_op_ac_lock] 0-management: Unable to acquire volname [2017-06-28 15:10:05.055274] E [MSGID: 106062] [glusterd-op-sm.c:4097:glusterd_op_ac_unlock] 0-management: Unable to acquire volname This works fine : nfs-ganesha-gluster-2.4.4-8.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64 This doesn't work : [root@gqas013 Ansible]# rpm -qa|grep ganesha glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64 nfs-ganesha-2.4.4-11.el7rhgs.x86_64 The underlying OS is same in both the cases - 7.4. Version-Release number of selected component (if applicable): -------------------------------------------------------------- glusterfs-ganesha-3.8.4-31.el7rhgs.x86_64 nfs-ganesha-2.4.4-11.el7rhgs.x86_64 How reproducible: ----------------- Every which way I try.
This is what I see in messages when I try to enable Ganesha : Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha configuration... Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount: Directory /var/run/gluster/shared_storage to mount over is not empty, mounting anyway. Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage... Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab, GlusterFS is already mounted on /run/gluster/shared_storage Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration. Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount process exited, code=exited status=32 Jun 28 13:00:08 gqas013 systemd: Failed to mount /var/run/gluster/shared_storage. Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file server. Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with result 'dependency'. Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount entered failed state.
(In reply to Ambarish from comment #2) > This is what I see in messages when I try to enable Ganesha : > > Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha > configuration... > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount: > Directory /var/run/gluster/shared_storage to mount over is not empty, > mounting anyway. > Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage... > Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab, > GlusterFS is already mounted on /run/gluster/shared_storage > Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration. > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount > process exited, code=exited status=32 > Jun 28 13:00:08 gqas013 systemd: Failed to mount > /var/run/gluster/shared_storage. > Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file > server. > Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with > result 'dependency'. > Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount > entered failed state. So is mounting SS failing on latest bits?
(In reply to Ambarish from comment #4) > (In reply to Ambarish from comment #2) > > This is what I see in messages when I try to enable Ganesha : > > > > Jun 28 13:00:08 gqas013 systemd: Starting Process NFS-Ganesha > > configuration... > > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount: > > Directory /var/run/gluster/shared_storage to mount over is not empty, > > mounting anyway. > > Jun 28 13:00:08 gqas013 systemd: Mounting /var/run/gluster/shared_storage... > > Jun 28 13:00:08 gqas013 mount: /sbin/mount.glusterfs: according to mtab, > > GlusterFS is already mounted on /run/gluster/shared_storage > > Jun 28 13:00:08 gqas013 systemd: Started Process NFS-Ganesha configuration. > > Jun 28 13:00:08 gqas013 systemd: var-run-gluster-shared_storage.mount mount > > process exited, code=exited status=32 > > Jun 28 13:00:08 gqas013 systemd: Failed to mount > > /var/run/gluster/shared_storage. > > Jun 28 13:00:08 gqas013 systemd: Dependency failed for NFS-Ganesha file > > server. > > Jun 28 13:00:08 gqas013 systemd: Job nfs-ganesha.service/start failed with > > result 'dependency'. > > Jun 28 13:00:08 gqas013 systemd: Unit var-run-gluster-shared_storage.mount > > entered failed state. > > So is mounting SS failing on latest bits? [root@gqas013 ganesha]# mount |grep shared gqas013.sbu.lab.eng.bos.redhat.com:/gluster_shared_storage on /run/gluster/shared_storage type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@gqas013 ganesha]#
RCA : We had add dependency in nfs-ganesha system file for var-run-gluster-shared_storage.mount . This is service created by systemd automatically based on /etc/fstab entry during reboot. But when enable shared storage for the first time, there is no such service. So nfs-ganesha will fail to start of because of that dependency Workaround: For the time being remove that dependency from nfs-ganesha.service file edit following file /usr/lib/systemd/system/nfs-ganesha.service remove dependency on var-run-gluster-shared_storage.mount from Requires(remove this line completely) and After
(In reply to Jiffin from comment #7) > RCA : > We had add dependency in nfs-ganesha system file for > var-run-gluster-shared_storage.mount . This is service created by systemd > automatically based on /etc/fstab entry during reboot. But when enable > shared storage for the first time, there is no such service. So nfs-ganesha > will fail to start of because of that dependency > > Workaround: > For the time being remove that dependency from nfs-ganesha.service file > edit following file /usr/lib/systemd/system/nfs-ganesha.service > remove dependency on var-run-gluster-shared_storage.mount from > Requires(remove this line completely) and After Other workaround which could be applied IMO is #umount /run/gluster/shared_storage #systemctl restart var-run-gluster-shared_storage.mount (make sure status of this service is SUCCESS) #then go ahead with the cluster setup.
Created attachment 1297015 [details] Remove dependency on glusterfssharedstorage for ganesha
Created attachment 1297425 [details] Second patch , first patch miissed some of my changes
Created attachment 1299823 [details] Start-nfs-ganesha-only-if-share-storage-mount-got-su.patch
Works fine on nfs-ganesha-2.4.4-16. I am able to bring up a Ganesha cluster,disable Ganesha and re-enable again. Moving to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2779