Bug 1486320

Summary:	client graph does not reload if the actual server node participating in mount is unrechable
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rahul Hinduja <rhinduja>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED UPSTREAM	QA Contact:	Bala Konda Reddy M <bmekala>
Severity:	high	Docs Contact:
Priority:	medium
Version:	rhgs-3.3	CC:	kaushal, nchilaka, rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-11-06 14:12:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rahul Hinduja 2017-08-29 13:09:44 UTC

Description of problem:
=======================

As per the current design, if the node is unreachable which is used for mounting the subsequent changes at server do not cause the client graph reload. 

Replace brick when the node is unreachable is highly impacted by this. Following is one of the scenario: 

1. Create 2x2 volume from 3 nodes such that N1 has b1, N2 has b2 and b3, and N3 has b4. Where b1 and b2 are replica set, and b3 and b4 are replica set.
2. Mount the volume via N1 IP
3. For somereason, the N1 is unrecoverable. This causes the brick b1 to be lost. Inorder to recover the systems we choose to use Replace host with different node N4. 
4. Probe the N4
5. Use replace-brick commit force for N1=>b1 to N4=>b1. 
6. This doesnt reload the client graph and client do not have information about N4=>b1. 

Post this, all the IO's from client do not happen on N4=>b1 and it is heal pending from N2=>b2.

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.8.4-43.el7.x86_64


How reproducible:
=================

Always

Comment 2 Atin Mukherjee 2017-08-29 13:53:47 UTC

RCA:

This is a design issue from day 1. Clients get notified about the graph change through the local glusterd and in any case if we end up changing the topology of the volume where one of the glusterd instance is down (in this case the local glusterd of the client), then the client will never get the notification and hence will not be aware of the change of the brick and that's the reason why the I/O will not hit on the new brick which is replaced till the client is remounted back. Unfortunately this problem gets exposed through replace brick because of its commit force flavour where one of the node can be completely offline.

Kaushal - any thoughts about how this can be addressed in GD2?

Comment 3 Kaushal 2017-08-30 09:52:28 UTC

This would be solved if clients (re)connect to other volfile servers when the initial connection is lost.

Most of the changes for this would be needed in glusterfsd, with very minimal changes in GD2 (or GD1).

Glusterfsd's would need changes to do the following,
- First connect to the given volfile server, fetch volfile and start.
- Get a list of the other servers in the TSP. Keep this list updated regularly.
- When the volfile server connection is lost, connect to one of the other servers.

GD2 would need one change to add a new RPC for getting a list of servers in the pool. Alternatively, if glusterfsd were to implement a ReST client, it could just call the existing peer list API.

Comment 5 Atin Mukherjee 2018-10-25 02:54:41 UTC

*** Bug 1639566 has been marked as a duplicate of this bug. ***