Bug 1297305

Summary: [GlusterD]: Peer detach happening with a node which is hosting volume bricks
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.7CC: amukherj, bsrirama, bugs, nlevinki, rhs-bugs, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1293414 Environment:
Last Closed: 2016-02-14 03:23:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1293273, 1293414    
Bug Blocks:    

Description Atin Mukherjee 2016-01-11 07:01:27 UTC
+++ This bug was initially created as a clone of Bug #1293414 +++

+++ This bug was initially created as a clone of Bug #1293273 +++

Description of problem:
=======================
Had  a one node (Node-1) cluster with Distributed volume with one brick, expanded the cluster and volume by adding brick of newly added node (node-2) and again peer probed third node and tried to peer detach the second node (node-2) from third node, it removed the second node from the cluster.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-12


How reproducible:
=================
Always


Steps to Reproduce:
===================
1.Have one node(Node-1)cluster with Distributed volume 
2.Added one more node (Node-2) to the cluster 
3.Add a brick part of node-2 to the volume 
4.Again peer probe one more node (node-3)
5.Go to node-3 and detach the second node (node-2) //detach will happen


Actual results:
===============
Peer detach happening with node hosting the volume bricks.


Expected results:
==================
Peer detach should not happen if node is hosting the bricks.



Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-12-21 04:38:48 EST ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:38:57 EST ---

Looks like this is a day zero bug and here is why:

When step 4 was executed the probed node (say N3) goes for importing volumes from the probing node (N1), but it still doesn't have information about the other node (N2) about its membership (since peer update happens post volume updates) and hence fail to update its brick's uuid. Post that even though N2 updates N3 about its membership the brick's uuid was never generated. Now as a consequence when N3 initiates a detach of N2, it checks whether the node to be detached has any bricks configured by its respective uuid which is NULL in this case and hence it goes ahead and removes the peer which ideally it shouldn't have.

I think we'd need to think about doing a peer list update first before volume data to fix these types of inconsistencies which itself is a effort.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:39:26 EST ---

Given that its a complex fix, I'd prefer to mark it as a known issue for 3.1.2.

--- Additional comment from Atin Mukherjee on 2015-12-21 08:47:51 EST ---

Another way of fixing it would be to import the uuid and just updating it instead of resolving. Need to validate it though.

--- Additional comment from Vijay Bellur on 2015-12-21 12:49:52 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2015-12-22 10:12:12 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2015-12-28 12:10:35 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#3) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2015-12-30 00:28:39 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#4) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2016-01-05 02:21:32 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#5) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2016-01-07 22:51:49 EST ---

REVIEW: http://review.gluster.org/13047 (glusterd: import/export brickinfo->uuid) posted (#6) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2016-01-11 01:46:53 EST ---

COMMIT: http://review.gluster.org/13047 committed in master by Atin Mukherjee (amukherj) 
------
commit c449b7520c6f1ac6ea1bc4119dbbbe9ebb80bf93
Author: Atin Mukherjee <amukherj>
Date:   Mon Dec 21 23:13:43 2015 +0530

    glusterd: import/export brickinfo->uuid
    
    Given a two node cluster with node N1 & N2, if a dummy node N3 is peer probed, the
    probed node N3  goes for importing volumes from the probing node (N1), but
    it still doesn't have information about the other node (N2) about its membership
    (since peer update happens post volume updates) and hence fail to update its
    brick's uuid. Post that even though N2 updates N3 about its membership the
    brick's uuid was never generated. Now as a consequence when N3 initiates a
    detach of N2, it checks whether the node to be detached has any bricks
    configured by its respective uuid which is NULL in this case and hence it goes
    ahead and removes the peer which ideally it shouldn't have (refer to
    glusterd_friend_contains_vol_bricks () for the logic)
    
    Fix is to export brick's uuid and import it at the probed node instead of
    resolving it.
    
    Change-Id: I2d88c72175347550a45ab12aff0ae248e56baa87
    BUG: 1293414
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/13047
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Gaurav Kumar Garg <ggarg>
    Reviewed-by: Avra Sengupta <asengupt>

Comment 1 Vijay Bellur 2016-01-11 07:02:30 UTC
REVIEW: http://review.gluster.org/13210 (glusterd: import/export brickinfo->uuid) posted (#1) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 2 Vijay Bellur 2016-01-12 11:30:14 UTC
REVIEW: http://review.gluster.org/13210 (glusterd: import/export brickinfo->uuid) posted (#2) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 3 Vijay Bellur 2016-01-12 11:31:49 UTC
REVIEW: http://review.gluster.org/13210 (glusterd: import/export brickinfo->uuid) posted (#3) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 4 Vijay Bellur 2016-01-12 11:32:52 UTC
REVIEW: http://review.gluster.org/13210 (glusterd: import/export brickinfo->uuid) posted (#4) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 5 Vijay Bellur 2016-01-14 09:35:09 UTC
COMMIT: http://review.gluster.org/13210 committed in release-3.7 by Atin Mukherjee (amukherj) 
------
commit a7b399fd0ef928c2cca4092b00edb21e70c59f62
Author: Atin Mukherjee <amukherj>
Date:   Mon Dec 21 23:13:43 2015 +0530

    glusterd: import/export brickinfo->uuid
    
    Backport of http://review.gluster.org/#/c/13047/
    
        Given a two node cluster with node N1 & N2, if a dummy node N3 is peer probed, the
        probed node N3  goes for importing volumes from the probing node (N1), but
        it still doesn't have information about the other node (N2) about its membership
        (since peer update happens post volume updates) and hence fail to update its
        brick's uuid. Post that even though N2 updates N3 about its membership the
        brick's uuid was never generated. Now as a consequence when N3 initiates a
        detach of N2, it checks whether the node to be detached has any bricks
        configured by its respective uuid which is NULL in this case and hence it goes
        ahead and removes the peer which ideally it shouldn't have (refer to
        glusterd_friend_contains_vol_bricks () for the logic)
    
        Fix is to export brick's uuid and import it at the probed node instead of
        resolving it.
    
    Change-Id: I2d88c72175347550a45ab12aff0ae248e56baa87
    BUG: 1297305
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/13047
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Gaurav Kumar Garg <ggarg>
    Reviewed-by: Avra Sengupta <asengupt>
    Reviewed-on: http://review.gluster.org/13210

Comment 6 Atin Mukherjee 2016-02-14 03:23:54 UTC
Fix is already available in latest 3.7.x release.

Comment 7 Kaushal 2016-04-19 07:52:37 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report.

glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user