Bug 1293414 - [GlusterD]: Peer detach happening with a node which is hosting volume bricks
Summary: [GlusterD]: Peer detach happening with a node which is hosting volume bricks
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On: 1293273
Blocks: 1297305
TreeView+ depends on / blocked
 
Reported: 2015-12-21 17:42 UTC by Atin Mukherjee
Modified: 2016-06-16 13:51 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1293273
: 1297305 (view as bug list)
Environment:
Last Closed: 2016-06-16 13:51:54 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Atin Mukherjee 2015-12-21 17:42:30 UTC
+++ This bug was initially created as a clone of Bug #1293273 +++

Description of problem:
=======================
Had  a one node (Node-1) cluster with Distributed volume with one brick, expanded the cluster and volume by adding brick of newly added node (node-2) and again peer probed third node and tried to peer detach the second node (node-2) from third node, it removed the second node from the cluster.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-12


How reproducible:
=================
Always


Steps to Reproduce:
===================
1.Have one node(Node-1)cluster with Distributed volume 
2.Added one more node (Node-2) to the cluster 
3.Add a brick part of node-2 to the volume 
4.Again peer probe one more node (node-3)
5.Go to node-3 and detach the second node (node-2) //detach will happen


Actual results:
===============
Peer detach happening with node hosting the volume bricks.


Expected results:
==================
Peer detach should not happen if node is hosting the bricks.



Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-12-21 04:38:48 EST ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:38:57 EST ---

Looks like this is a day zero bug and here is why:

When step 4 was executed the probed node (say N3) goes for importing volumes from the probing node (N1), but it still doesn't have information about the other node (N2) about its membership (since peer update happens post volume updates) and hence fail to update its brick's uuid. Post that even though N2 updates N3 about its membership the brick's uuid was never generated. Now as a consequence when N3 initiates a detach of N2, it checks whether the node to be detached has any bricks configured by its respective uuid which is NULL in this case and hence it goes ahead and removes the peer which ideally it shouldn't have.

I think we'd need to think about doing a peer list update first before volume data to fix these types of inconsistencies which itself is a effort.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:39:26 EST ---

Given that its a complex fix, I'd prefer to mark it as a known issue for 3.1.2.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:47:51 EST ---

Another way of fixing it would be to import the uuid and just updating it instead of resolving. Need to validate it though.

Comment 1 Vijay Bellur 2015-12-21 17:49:52 UTC
REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 2 Vijay Bellur 2015-12-22 15:12:12 UTC
REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 3 Vijay Bellur 2015-12-28 17:10:35 UTC
REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#3) for review on master by Atin Mukherjee (amukherj)

Comment 4 Vijay Bellur 2015-12-30 05:28:39 UTC
REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#4) for review on master by Atin Mukherjee (amukherj)

Comment 5 Vijay Bellur 2016-01-05 07:21:32 UTC
REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#5) for review on master by Atin Mukherjee (amukherj)

Comment 6 Vijay Bellur 2016-01-08 03:51:49 UTC
REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#6) for review on master by Atin Mukherjee (amukherj)

Comment 7 Vijay Bellur 2016-01-11 06:46:53 UTC
COMMIT: http://review.gluster.org/13047 committed in master by Atin Mukherjee (amukherj) 
------
commit c449b7520c6f1ac6ea1bc4119dbbbe9ebb80bf93
Author: Atin Mukherjee <amukherj>
Date:   Mon Dec 21 23:13:43 2015 +0530

    glusterd: import/export brickinfo->uuid
    
    Given a two node cluster with node N1 & N2, if a dummy node N3 is peer probed, the
    probed node N3  goes for importing volumes from the probing node (N1), but
    it still doesn't have information about the other node (N2) about its membership
    (since peer update happens post volume updates) and hence fail to update its
    brick's uuid. Post that even though N2 updates N3 about its membership the
    brick's uuid was never generated. Now as a consequence when N3 initiates a
    detach of N2, it checks whether the node to be detached has any bricks
    configured by its respective uuid which is NULL in this case and hence it goes
    ahead and removes the peer which ideally it shouldn't have (refer to
    glusterd_friend_contains_vol_bricks () for the logic)
    
    Fix is to export brick's uuid and import it at the probed node instead of
    resolving it.
    
    Change-Id: I2d88c72175347550a45ab12aff0ae248e56baa87
    BUG: 1293414
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/13047
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Gaurav Kumar Garg <ggarg>
    Reviewed-by: Avra Sengupta <asengupt>

Comment 8 Niels de Vos 2016-06-16 13:51:54 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.