1351949 – management connection loss when volfile-server goes down

Bug 1351949 - management connection loss when volfile-server goes down

Summary: management connection loss when volfile-server goes down

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Prasanna Kumar Kalever
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:	1289916
Blocks:	1351522 1351530
TreeView+	depends on / blocked

Reported:	2016-07-01 08:43 UTC by Prasanna Kumar Kalever
Modified:	2017-03-23 05:38 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	Bug Fix
Doc Text:	Previously, when the nominated volfile server in a Red Hat Gluster Storage cluster became unavailable, any configuration changes that would result in changes to the volfile were not passed to clients until the volfile server became available again. This update ensures that when a volfile server becomes unavailable, another server takes over the role of volfile server, so that clients do not need to wait to receive updates from the original volfile server.
Clone Of:	1289916
Environment:
Last Closed:	2017-03-23 05:38:44 UTC
Embargoed:
Dependent Products:
Flags:	prasanna.kalever: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Prasanna Kumar Kalever 2016-07-01 08:43:06 UTC

+++ This bug was initially created as a clone of Bug #1289916 +++

Description of problem:
Client will not get notified in case of any volume changes if node that is used with mount (volfile server) goes down

Version-Release number of selected component (if applicable):
3.1.3

How reproducible:

Steps to Reproduce:
1. Create a volume with 2 nodes
2. mount with first node IP (as volfile server)
3. Kill the first node
4. Add a new brick to the volume
5. Notice that client will not be notified about the changes done for volume


Actual results:
As a result changes to volume may not effect, example cannot store the files on to New brick added

Expected results:
Client should switch to next possible remote host and communicate with glusterd

--- Additional comment from Vijay Bellur on 2016-04-12 08:14:27 EDT ---

COMMIT: http://review.gluster.org/13002 committed in master by Jeff Darcy (jdarcy) 
------
commit 05bc8bfd2a11d280fe0aaac6c7ae86ea5ff08164
Author: Prasanna Kumar Kalever <prasanna.kalever>
Date:   Thu Mar 17 13:50:31 2016 +0530

    glusterd-client: switch volfile server incase existing connection breaks
    
    Problem:
    Currently, say we have 10 Node gluster volume, and mounted it using
    Node 1 (N1) as volfile server and the rest as backup volfile servers
    
    $ mount -t glusterfs -obackup-volfile-servers=<N2>:<N3>:...:<N10> <N1>:/vol /mnt
    
    if N1 goes down we still be able to access the same mount point,
    but the problem is that if we add or remove bricks to the volume
    whoes volfile server is down in our case N1, that info will not be
    passed to client, because connection between glusterfs and glusterd (of N1)
    will be disconnected due to which we cannot store files to the newly
    added bricks until N1 comes back
    
    Solution:
    If N1 goes down iterate through the nodes specified in
    backup-volfile-servers list and try to establish the connection between
    glusterfs and glsuterd, hence we don't really have to wait
    until N1 comes back to store files in newly added bricks that are
    successfully added when N1 was down
    
    Change-Id: I653c9f081a84667630608091bc243ffc3859d5cd
    BUG: 1289916
    Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever>
    Reviewed-on: http://review.gluster.org/13002
    Tested-by: Prasanna Kumar Kalever <pkalever>
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Poornima G <pgurusid>
    Reviewed-by: Jeff Darcy <jdarcy>

--- Additional comment from Niels de Vos on 2016-06-16 09:49:41 EDT ---

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 2 Atin Mukherjee 2016-07-01 09:08:20 UTC

An upstream patch http://review.gluster.org/#/c/13002/ is already merged.

Comment 4 Atin Mukherjee 2016-09-17 13:23:35 UTC

Upstream mainline : http://review.gluster.org/13002
Upstream 3.8 : available as part of branching

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 9 Byreddy 2016-10-05 06:37:21 UTC

Verified this bug using the build - glusterfs-3.8.4-2.

I see that when the master vol file server is down, it's switching to the other backup servers specified as backup-volfile-servers in the below mount command

mount -t glusterfs -o backup-volfile-servers=<N2>:<N3>:...:<N10> <N1>:/vol /mnt


Done some volume add-brick, remove-brick and volume set operations, i could able to see the changes in the mount log & volume mount size accordingly

Moving to verified state.

Comment 13 errata-xmlrpc 2017-03-23 05:38:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.