1293273 – [GlusterD]: Peer detach happening with a node which is hosting volume bricks

Bug 1293273 - [GlusterD]: Peer detach happening with a node which is hosting volume bricks

Summary: [GlusterD]: Peer detach happening with a node which is hosting volume bricks

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:	GlusterD
Depends On:
Blocks:	1293414 1297305 1299184
TreeView+	depends on / blocked

Reported:	2015-12-21 09:38 UTC by Byreddy
Modified:	2016-06-30 06:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.9-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1293414 (view as bug list)
Environment:
Last Closed:	2016-06-23 04:59:48 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1233213	0	unspecified	CLOSED	[New] - volume info --xml gives host UUID as zeros	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Internal Links: 1233213

Description Byreddy 2015-12-21 09:38:43 UTC

Description of problem:
=======================
Had  a one node (Node-1) cluster with Distributed volume with one brick, expanded the cluster and volume by adding brick of newly added node (node-2) and again peer probed third node and tried to peer detach the second node (node-2) from third node, it removed the second node from the cluster.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-12


How reproducible:
=================
Always


Steps to Reproduce:
===================
1.Have one node(Node-1)cluster with Distributed volume 
2.Added one more node (Node-2) to the cluster 
3.Add a brick part of node-2 to the volume 
4.Again peer probe one more node (node-3)
5.Go to node-3 and detach the second node (node-2) //detach will happen


Actual results:
===============
Peer detach happening with node hosting the volume bricks.


Expected results:
==================
Peer detach should not happen if node is hosting the bricks.



Additional info:

Comment 2 Atin Mukherjee 2015-12-21 13:38:57 UTC

Looks like this is a day zero bug and here is why:

When step 4 was executed the probed node (say N3) goes for importing volumes from the probing node (N1), but it still doesn't have information about the other node (N2) about its membership (since peer update happens post volume updates) and hence fail to update its brick's uuid. Post that even though N2 updates N3 about its membership the brick's uuid was never generated. Now as a consequence when N3 initiates a detach of N2, it checks whether the node to be detached has any bricks configured by its respective uuid which is NULL in this case and hence it goes ahead and removes the peer which ideally it shouldn't have.

I think we'd need to think about doing a peer list update first before volume data to fix these types of inconsistencies which itself is a effort.

Comment 4 Atin Mukherjee 2015-12-21 13:47:51 UTC

Another way of fixing it would be to import the uuid and just updating it instead of resolving. Need to validate it though.

Comment 5 Atin Mukherjee 2015-12-21 17:50:48 UTC

http://review.gluster.org/13047 is posted for review upstream

Comment 7 Atin Mukherjee 2016-03-22 12:08:23 UTC

The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.

Comment 9 Byreddy 2016-04-04 05:33:16 UTC

Verified this issue using the build  "glusterfs-3.7.9-1"

Repeated the reproducing steps mentioned in description section, Fix is working properly, it's not allowing to detach a node which is hosting the bricks.


Moving to verified state with above details.

Comment 12 errata-xmlrpc 2016-06-23 04:59:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.