1335090 – Shared volume doesn't get mounted on few nodes after rebooting all nodes in cluster.

Bug 1335090 - Shared volume doesn't get mounted on few nodes after rebooting all nodes in cluster.

Summary: Shared volume doesn't get mounted on few nodes after rebooting all nodes in c...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Jiffin
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:	1452527
Blocks:	1417147 1451981
TreeView+	depends on / blocked

Reported:	2016-05-11 10:45 UTC by Shashank Raj
Modified:	2017-09-21 04:54 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-3.8.4-34
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1452527 (view as bug list)
Environment:
Last Closed:	2017-09-21 04:28:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Shashank Raj 2016-05-11 10:45:12 UTC

Description of problem:

shared volume doesn't get mounted on one (maybe two) node after rebooting all nodes in cluster, resulting in missing symlink (/var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-239.lab.eng.blr.redhat.com/nfs) .

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-4 and nfs-ganesha-2.3.1-6

How reproducible:
Always

Steps to Reproduce:
1. Create a 4 node ganesha cluster.
2. Make sure the shared volume is created and mounted on all the nodes of cluster and the symlink is created as below.

[root@dhcp42-20 ~]# gluster volume status gluster_shared_storage
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp42-239.lab.eng.blr.redhat.com:/va
r/lib/glusterd/ss_brick                     49155     0          Y       2293 
Brick dhcp43-175.lab.eng.blr.redhat.com:/va
r/lib/glusterd/ss_brick                     49155     0          Y       2281 
Brick dhcp42-20.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49155     0          Y       2266 
Self-heal Daemon on localhost               N/A       N/A        Y       2257 
Self-heal Daemon on dhcp42-239.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       2287 
Self-heal Daemon on dhcp43-175.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       2253 
Self-heal Daemon on dhcp42-196.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       2258 
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks

dhcp42-20.lab.eng.blr.redhat.com:/gluster_shared_storage  27740928 1697152  26043776   7% /run/gluster/shared_storage

dhcp42-239.lab.eng.blr.redhat.com:/gluster_shared_storage  27740928 1697152  26043776   7% /run/gluster/shared_storage

dhcp43-175.lab.eng.blr.redhat.com:/gluster_shared_storage  27740928 1697152  26043776   7% /run/gluster/shared_storage

dhcp42-196.lab.eng.blr.redhat.com:/gluster_shared_storage  27740928 1697152  26043776   7% /run/gluster/shared_storage

[root@dhcp42-20 ~]# ls -ld /var/lib/nfs
lrwxrwxrwx. 1 root root 80 May 11 21:26 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-20.lab.eng.blr.redhat.com/nfs

[root@dhcp42-239 ~]# ls -ld /var/lib/nfs
lrwxrwxrwx. 1 root root 81 May 11 21:26 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-239.lab.eng.blr.redhat.com/nfs

[root@dhcp43-175 ~]# ls -ld /var/lib/nfs
lrwxrwxrwx. 1 root root 81 May 11 21:26 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp43-175.lab.eng.blr.redhat.com/nfs

[root@dhcp42-196 ~]# ls -ld /var/lib/nfs
lrwxrwxrwx. 1 root root 81 May 11 21:19 /var/lib/nfs -> /var/run/gluster/shared_storage/nfs-ganesha/dhcp42-196.lab.eng.blr.redhat.com/nfs

3. Reboot all the nodes of the cluster.
4. Observe that on 2 of the 4 nodes, shared storage is not mounted. (most of the times it doesnt get mounted on any one node).
5.And the symlink from /var/lib/nfs doesn't get created because of this on these 2 nodes.
6. Both of these nodes have the entries in /etc/fstab and manually mounting the shared storage on these nodes works.


Actual results:

Shared volume doesn't get mounted on few nodes after rebooting all nodes in cluster.

Expected results:

Shared volume should get mounted on all the nodes after reboot

Additional info:

Comment 2 Shashank Raj 2016-05-11 10:49:35 UTC

sosreports can be found at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1335090

Comment 4 Avra Sengupta 2016-05-13 05:30:09 UTC

This is expected behaviour. We need to understand that the shared volume itself is hosted in these nodes, and all nodes mount it using one of the particular nodes. Now when all nodes are down, the shared storage volume is also essentially down. When the nodes come up, till the node whose entry is mentioned in /etc/fstab is up and serving, none of them will be able to connect to the shared storage. That node itself will never connect to the shared storage on reboot, as by the time /etc/fstab entry is replayed, the volume is not being served.

Comment 5 Atin Mukherjee 2016-05-13 15:44:53 UTC

Why can't we close this bug then?

Comment 7 Atin Mukherjee 2017-05-25 12:41:44 UTC

upstream patch : https://review.gluster.org/17339

Comment 12 Jiffin 2017-06-22 13:05:22 UTC

To work the above change following service need to be enabled
systemctl enable glusterfssharedstorage.service

Comment 13 Manisha Saini 2017-06-23 06:51:08 UTC

Verified this bug on

# rpm -qa | grep ganesha
glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64
nfs-ganesha-2.4.4-10.el7rhgs.x86_64


Steps:
1.Create a 4 node cluster
2.run systemctl enable glusterfssharedstorage.service on all the nodes 
3.reboot all the nodes

Shared_storage is mounted on all the nodes post reboot.

Will move this bug to verified once jiffin open the doc bug 
for enabling sharedstorage service post ganesha setup creation

Comment 14 Jiffin 2017-06-23 07:26:03 UTC

(In reply to Manisha Saini from comment #13)
> Verified this bug on
> 
> # rpm -qa | grep ganesha
> glusterfs-ganesha-3.8.4-29.el7rhgs.x86_64
> nfs-ganesha-gluster-2.4.4-10.el7rhgs.x86_64
> nfs-ganesha-debuginfo-2.4.4-10.el7rhgs.x86_64
> nfs-ganesha-2.4.4-10.el7rhgs.x86_64
> 
> 
> Steps:
> 1.Create a 4 node cluster
> 2.run systemctl enable glusterfssharedstorage.service on all the nodes 
> 3.reboot all the nodes
> 
> Shared_storage is mounted on all the nodes post reboot.
> 
> Will move this bug to verified once jiffin open the doc bug 
> for enabling sharedstorage service post ganesha setup creation

Thanks Manisha . I have opened doc bug for this issue .https://bugzilla.redhat.com/show_bug.cgi?id=1464342

Comment 15 Manisha Saini 2017-06-23 09:41:12 UTC

Opened the bug to add this in Gdeploy as well while setting up ganesha cluster 

https://bugzilla.redhat.com/show_bug.cgi?id=1464375

Comment 19 Manisha Saini 2017-07-22 11:34:09 UTC

Verified this Bug on 

# rpm -qa | grep ganesha
glusterfs-ganesha-3.8.4-34.el7rhgs.x86_64
nfs-ganesha-2.4.4-16.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64

Tested On 6 node ganesha cluster.After rebooting all the 6 nodes,shared-storage is mounted on all the nodes

Comment 21 errata-xmlrpc 2017-09-21 04:28:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 22 errata-xmlrpc 2017-09-21 04:54:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.