Description of problem: shared volume doesn't get mounted on one (maybe two) node after rebooting all nodes in cluster, resulting in missing symlink (/var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-239.lab.eng.blr.redhat.com/nfs) . Version-Release number of selected component (if applicable): glusterfs-3.7.9-4 and nfs-ganesha-2.3.1-6 How reproducible: Always Steps to Reproduce: 1. Create a 4 node ganesha cluster. 2. Make sure the shared volume is created and mounted on all the nodes of cluster and the symlink is created as below. [root@dhcp42-20 ~]# gluster volume status gluster_shared_storage Status of volume: gluster_shared_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp42-239.lab.eng.blr.redhat.com:/va r/lib/glusterd/ss_brick 49155 0 Y 2293 Brick dhcp43-175.lab.eng.blr.redhat.com:/va r/lib/glusterd/ss_brick 49155 0 Y 2281 Brick dhcp42-20.lab.eng.blr.redhat.com:/var /lib/glusterd/ss_brick 49155 0 Y 2266 Self-heal Daemon on localhost N/A N/A Y 2257 Self-heal Daemon on dhcp42-239.lab.eng.blr. redhat.com N/A N/A Y 2287 Self-heal Daemon on dhcp43-175.lab.eng.blr. redhat.com N/A N/A Y 2253 Self-heal Daemon on dhcp42-196.lab.eng.blr. redhat.com N/A N/A Y 2258 Task Status of Volume gluster_shared_storage ------------------------------------------------------------------------------ There are no active volume tasks dhcp42-20.lab.eng.blr.redhat.com:/gluster_shared_storage 27740928 1697152 26043776 7% /run/gluster/shared_storage dhcp42-239.lab.eng.blr.redhat.com:/gluster_shared_storage 27740928 1697152 26043776 7% /run/gluster/shared_storage dhcp43-175.lab.eng.blr.redhat.com:/gluster_shared_storage 27740928 1697152 26043776 7% /run/gluster/shared_storage dhcp42-196.lab.eng.blr.redhat.com:/gluster_shared_storage 27740928 1697152 26043776 7% /run/gluster/shared_storage [root@dhcp42-20 ~]# ls -ld /var/lib/nfs lrwxrwxrwx. 1 root root 80 May 11 21:26 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-20.lab.eng.blr.redhat.com/nfs [root@dhcp42-239 ~]# ls -ld /var/lib/nfs lrwxrwxrwx. 1 root root 81 May 11 21:26 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-239.lab.eng.blr.redhat.com/nfs [root@dhcp43-175 ~]# ls -ld /var/lib/nfs lrwxrwxrwx. 1 root root 81 May 11 21:26 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp43-175.lab.eng.blr.redhat.com/nfs [root@dhcp42-196 ~]# ls -ld /var/lib/nfs lrwxrwxrwx. 1 root root 81 May 11 21:19 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-196.lab.eng.blr.redhat.com/nfs 3. Reboot all the nodes of the cluster. 4. Observe that on 2 of the 4 nodes, shared storage is not mounted. (most of the times it doesnt get mounted on any one node). 5.And the symlink from /var/lib/nfs doesn't get created because of this on these 2 nodes. 6. Both of these nodes have the entries in /etc/fstab and manually mounting the shared storage on these nodes works. Actual results: Shared volume doesn't get mounted on few nodes after rebooting all nodes in cluster. Expected results: Shared volume should get mounted on all the nodes after reboot Additional info:
sosreports can be found at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1335090
This is expected behaviour. We need to understand that the shared volume itself is hosted in these nodes, and all nodes mount it using one of the particular nodes. Now when all nodes are down, the shared storage volume is also essentially down. When the nodes come up, till the node whose entry is mentioned in /etc/fstab is up and serving, none of them will be able to connect to the shared storage. That node itself will never connect to the shared storage on reboot, as by the time /etc/fstab entry is replayed, the volume is not being served.
Why can't we close this bug then?
upstream patch : https://review.gluster.org/17339
To work the above change following service need to be enabled systemctl enable glusterfssharedstorage.service
Verified this bug on # rpm -qa | grep ganesha glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64 nfs-ganesha-2.4.4-10.el7rhgs.x86_64 Steps: 1.Create a 4 node cluster 2.run systemctl enable glusterfssharedstorage.service on all the nodes 3.reboot all the nodes Shared_storage is mounted on all the nodes post reboot. Will move this bug to verified once jiffin open the doc bug for enabling sharedstorage service post ganesha setup creation
(In reply to Manisha Saini from comment #13) > Verified this bug on > > # rpm -qa | grep ganesha > glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64 > nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64 > nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64 > nfs-ganesha-2.4.4-10.el7rhgs.x86_64 > > > Steps: > 1.Create a 4 node cluster > 2.run systemctl enable glusterfssharedstorage.service on all the nodes > 3.reboot all the nodes > > Shared_storage is mounted on all the nodes post reboot. > > Will move this bug to verified once jiffin open the doc bug > for enabling sharedstorage service post ganesha setup creation Thanks Manisha . I have opened doc bug for this issue .https://bugzilla.redhat.com/show_bug.cgi?id=1464342
Opened the bug to add this in Gdeploy as well while setting up ganesha cluster https://bugzilla.redhat.com/show_bug.cgi?id=1464375
Verified this Bug on # rpm -qa | grep ganesha glusterfs-ganesha-3.8.4-34.el7rhgs.x86_64 nfs-ganesha-2.4.4-16.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64 Tested On 6 node ganesha cluster.After rebooting all the 6 nodes,shared-storage is mounted on all the nodes
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774