Bug 1037430 - peer detach force on a non-existing IP/hostname is causing Assertion
Summary: peer detach force on a non-existing IP/hostname is causing Assertion
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: glusterd
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-03 07:22 UTC by spandura
Modified: 2015-12-03 17:18 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-03 17:18:20 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description spandura 2013-12-03 07:22:40 UTC
Description of problem:
======================
On a 3 x 2 distribute replicate volume (RHS-AMI's on AWS as nodes), 2 instances node2 and node3 were stopped. 

node1(brick1), node2(brick2)----------> replicate-subvolume-0

node3(brick3), node4(brick4)----------> replicate-subvolume-1

node5(brick5), node6(brick6)----------> replicate-subvolume-2

Restarted node2 and node3. Now node2 and node3 gets a new IP/Hostname. The glusterd UUID remains the same as previously. 

glusterd on node2 and node3 didn't get started because it is unable to resolve the bricks "brick2" and "brick3" as the brick contains node2 and node3's previous IP/Hostname. Refer to bug https://bugzilla.redhat.com/show_bug.cgi?id=1036551

To re-add node2 and node3 to the cluster deleted "/var/lib/glusterd/vols" directory from both the nodes. 

Performed "detach force" on node2 and node3 from node1. 

For node2  node1, node4, node5 and node6 were already in be-friend state since we had not removed "/var/lib/glusterd/peers"

Now, did a peer probe on node2's new IP/Hostname from node1. 

For node2 node1 is already in be-friend state. node2 doesn't try to re initiate the connection process when node1 sends a probe request but just sends an ACK for the probe request . Hence for node1, node2 will always be in "Accepted peer request (Connected)" state. But when establishing the connection node1 sends the volume information to node2 and node2 updates this volume information.

Note: Even though node1 has not moved node2 to "Peer in Cluster (Connected)" state, the volume information is sent from node1 to node2. 

From node2, if we try to "peer detach force" the node3's old IP , peer detach is unsuccessful and this causes an Assert and an ERROR message is reported in the glusterd log file. 

E [glusterd-utils.c:4612:glusterd_friend_brick_belongs] (-->/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f) [0x7fd286e4060f] (-->/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(__glusterd_handle_cli_deprobe+0x2e6) [0x7fd286e50326] (-->/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(glusterd_all_volume_cond_check+0x8f) [0x7fd286e5ffef]))) 0-: Assertion failed: 0"


Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3.4.0.44.1u2rhs built on Nov 25 2013 08:17:39

How reproducible:
==================

Steps to Reproduce:
====================
1. Create 2 x 2 (node1, node2, node3, node4) distribute-replicate volume. Start the volume

2. Stop node2 and node3.

3. Bring back node2 and node3. The ip address/Hostname of the nodes are changed. ( The glusterd is not started. https://bugzilla.redhat.com/show_bug.cgi?id=1036551 )

4. Remove the vols directory from "/var/lib/glusterd/vols" 

5. from node1 : "gluster detach "node2_old_ip" force" and "gluster detach "node3_old_ip" force"

6. from node1 : gluster peer probe "node2_new_ip". 

Node1 puts Node2 in the "Accepted peer request (Connected)" State. Node1 sends the volume information to Node2 and Node2 updates this volume information. 

7. from node2 : gluster detach "node3_old_ip" force . 

Actual results:
================
root@ip-10-114-246-246 [Dec-02-2013-10:37:40] >gluster peer detach 10.111.67.22
peer detach: failed: Brick(s) with the peer 10.111.67.22 exist in cluster

Expected results:
==================
Assertion message in the glusterd log file can be replaced by appropriate failure message i.e Unable to resolve brick path.

Comment 2 Vivek Agarwal 2015-12-03 17:18:20 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.


Note You need to log in before you can comment on or make changes to this bug.