Bug 1293273 - [GlusterD]: Peer detach happening with a node which is hosting volume bricks
[GlusterD]: Peer detach happening with a node which is hosting volume bricks
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
x86_64 Linux
unspecified Severity high
: ---
: RHGS 3.1.3
Assigned To: Atin Mukherjee
: ZStream
Depends On:
Blocks: 1293414 1297305 1299184
  Show dependency treegraph
Reported: 2015-12-21 04:38 EST by Byreddy
Modified: 2016-06-30 02:12 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1293414 (view as bug list)
Last Closed: 2016-06-23 00:59:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Byreddy 2015-12-21 04:38:43 EST
Description of problem:
Had  a one node (Node-1) cluster with Distributed volume with one brick, expanded the cluster and volume by adding brick of newly added node (node-2) and again peer probed third node and tried to peer detach the second node (node-2) from third node, it removed the second node from the cluster.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Have one node(Node-1)cluster with Distributed volume 
2.Added one more node (Node-2) to the cluster 
3.Add a brick part of node-2 to the volume 
4.Again peer probe one more node (node-3)
5.Go to node-3 and detach the second node (node-2) //detach will happen

Actual results:
Peer detach happening with node hosting the volume bricks.

Expected results:
Peer detach should not happen if node is hosting the bricks.

Additional info:
Comment 2 Atin Mukherjee 2015-12-21 08:38:57 EST
Looks like this is a day zero bug and here is why:

When step 4 was executed the probed node (say N3) goes for importing volumes from the probing node (N1), but it still doesn't have information about the other node (N2) about its membership (since peer update happens post volume updates) and hence fail to update its brick's uuid. Post that even though N2 updates N3 about its membership the brick's uuid was never generated. Now as a consequence when N3 initiates a detach of N2, it checks whether the node to be detached has any bricks configured by its respective uuid which is NULL in this case and hence it goes ahead and removes the peer which ideally it shouldn't have.

I think we'd need to think about doing a peer list update first before volume data to fix these types of inconsistencies which itself is a effort.
Comment 4 Atin Mukherjee 2015-12-21 08:47:51 EST
Another way of fixing it would be to import the uuid and just updating it instead of resolving. Need to validate it though.
Comment 5 Atin Mukherjee 2015-12-21 12:50:48 EST
http://review.gluster.org/13047 is posted for review upstream
Comment 7 Atin Mukherjee 2016-03-22 08:08:23 EDT
The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.
Comment 9 Byreddy 2016-04-04 01:33:16 EDT
Verified this issue using the build  "glusterfs-3.7.9-1"

Repeated the reproducing steps mentioned in description section, Fix is working properly, it's not allowing to detach a node which is hosting the bricks.

Moving to verified state with above details.
Comment 12 errata-xmlrpc 2016-06-23 00:59:48 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.