Description of problem: ======================= Had a one node (Node-1) cluster with Distributed volume with one brick, expanded the cluster and volume by adding brick of newly added node (node-2) and again peer probed third node and tried to peer detach the second node (node-2) from third node, it removed the second node from the cluster. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-12 How reproducible: ================= Always Steps to Reproduce: =================== 1.Have one node(Node-1)cluster with Distributed volume 2.Added one more node (Node-2) to the cluster 3.Add a brick part of node-2 to the volume 4.Again peer probe one more node (node-3) 5.Go to node-3 and detach the second node (node-2) //detach will happen Actual results: =============== Peer detach happening with node hosting the volume bricks. Expected results: ================== Peer detach should not happen if node is hosting the bricks. Additional info:
Looks like this is a day zero bug and here is why: When step 4 was executed the probed node (say N3) goes for importing volumes from the probing node (N1), but it still doesn't have information about the other node (N2) about its membership (since peer update happens post volume updates) and hence fail to update its brick's uuid. Post that even though N2 updates N3 about its membership the brick's uuid was never generated. Now as a consequence when N3 initiates a detach of N2, it checks whether the node to be detached has any bricks configured by its respective uuid which is NULL in this case and hence it goes ahead and removes the peer which ideally it shouldn't have. I think we'd need to think about doing a peer list update first before volume data to fix these types of inconsistencies which itself is a effort.
Another way of fixing it would be to import the uuid and just updating it instead of resolving. Need to validate it though.
http://review.gluster.org/13047 is posted for review upstream
The fix is now available in rhgs-3.1.3 branch, hence moving the state to Modified.
Verified this issue using the build "glusterfs-3.7.9-1" Repeated the reproducing steps mentioned in description section, Fix is working properly, it's not allowing to detach a node which is hosting the bricks. Moving to verified state with above details.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240