Bug 1293273 - [GlusterD]: Peer detach happening with a node which is hosting volume bricks
Summary: [GlusterD]: Peer detach happening with a node which is hosting volume bricks
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.1.3
Assignee: Atin Mukherjee
QA Contact: Byreddy
URL:
Whiteboard: GlusterD
Depends On:
Blocks: 1293414 1297305 1299184
TreeView+ depends on / blocked
 
Reported: 2015-12-21 09:38 UTC by Byreddy
Modified: 2016-06-30 06:12 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.7.9-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1293414 (view as bug list)
Environment:
Last Closed: 2016-06-23 04:59:48 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1233213 0 unspecified CLOSED [New] - volume info --xml gives host UUID as zeros 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Internal Links: 1233213

Description Byreddy 2015-12-21 09:38:43 UTC
Description of problem:
=======================
Had  a one node (Node-1) cluster with Distributed volume with one brick, expanded the cluster and volume by adding brick of newly added node (node-2) and again peer probed third node and tried to peer detach the second node (node-2) from third node, it removed the second node from the cluster.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-12


How reproducible:
=================
Always


Steps to Reproduce:
===================
1.Have one node(Node-1)cluster with Distributed volume 
2.Added one more node (Node-2) to the cluster 
3.Add a brick part of node-2 to the volume 
4.Again peer probe one more node (node-3)
5.Go to node-3 and detach the second node (node-2) //detach will happen


Actual results:
===============
Peer detach happening with node hosting the volume bricks.


Expected results:
==================
Peer detach should not happen if node is hosting the bricks.



Additional info:

Comment 2 Atin Mukherjee 2015-12-21 13:38:57 UTC
Looks like this is a day zero bug and here is why:

When step 4 was executed the probed node (say N3) goes for importing volumes from the probing node (N1), but it still doesn't have information about the other node (N2) about its membership (since peer update happens post volume updates) and hence fail to update its brick's uuid. Post that even though N2 updates N3 about its membership the brick's uuid was never generated. Now as a consequence when N3 initiates a detach of N2, it checks whether the node to be detached has any bricks configured by its respective uuid which is NULL in this case and hence it goes ahead and removes the peer which ideally it shouldn't have.

I think we'd need to think about doing a peer list update first before volume data to fix these types of inconsistencies which itself is a effort.

Comment 4 Atin Mukherjee 2015-12-21 13:47:51 UTC
Another way of fixing it would be to import the uuid and just updating it instead of resolving. Need to validate it though.

Comment 5 Atin Mukherjee 2015-12-21 17:50:48 UTC
http://review.gluster.org/13047 is posted for review upstream

Comment 7 Atin Mukherjee 2016-03-22 12:08:23 UTC
The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.

Comment 9 Byreddy 2016-04-04 05:33:16 UTC
Verified this issue using the build  "glusterfs-3.7.9-1"

Repeated the reproducing steps mentioned in description section, Fix is working properly, it's not allowing to detach a node which is hosting the bricks.


Moving to verified state with above details.

Comment 12 errata-xmlrpc 2016-06-23 04:59:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240


Note You need to log in before you can comment on or make changes to this bug.