Hide Forgot
Description of problem: ====================== On a 3 x 2 distribute replicate volume (RHS-AMI's on AWS as nodes), 2 instances node2 and node3 were stopped. node1(brick1), node2(brick2)----------> replicate-subvolume-0 node3(brick3), node4(brick4)----------> replicate-subvolume-1 node5(brick5), node6(brick6)----------> replicate-subvolume-2 Restarted node2 and node3. Now node2 and node3 gets a new IP/Hostname. The glusterd UUID remains the same as previously. glusterd on node2 and node3 didn't get started because it is unable to resolve the bricks "brick2" and "brick3" as the brick contains node2 and node3's previous IP/Hostname. Refer to bug https://bugzilla.redhat.com/show_bug.cgi?id=1036551 To re-add node2 and node3 to the cluster deleted "/var/lib/glusterd/vols" directory from both the nodes. Performed "detach force" on node2 and node3 from node1. For node2 node1, node4, node5 and node6 were already in be-friend state since we had not removed "/var/lib/glusterd/peers" Now, did a peer probe on node2's new IP/Hostname from node1. For node2 node1 is already in be-friend state. node2 doesn't try to re initiate the connection process when node1 sends a probe request but just sends an ACK for the probe request . Hence for node1, node2 will always be in "Accepted peer request (Connected)" state. But when establishing the connection node1 sends the volume information to node2 and node2 updates this volume information. Note: Even though node1 has not moved node2 to "Peer in Cluster (Connected)" state, the volume information is sent from node1 to node2. From node2, if we try to "peer detach force" the node3's old IP , peer detach is unsuccessful and this causes an Assert and an ERROR message is reported in the glusterd log file. E [glusterd-utils.c:4612:glusterd_friend_brick_belongs] (-->/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f) [0x7fd286e4060f] (-->/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(__glusterd_handle_cli_deprobe+0x2e6) [0x7fd286e50326] (-->/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(glusterd_all_volume_cond_check+0x8f) [0x7fd286e5ffef]))) 0-: Assertion failed: 0" Version-Release number of selected component (if applicable): ============================================================= glusterfs 3.4.0.44.1u2rhs built on Nov 25 2013 08:17:39 How reproducible: ================== Steps to Reproduce: ==================== 1. Create 2 x 2 (node1, node2, node3, node4) distribute-replicate volume. Start the volume 2. Stop node2 and node3. 3. Bring back node2 and node3. The ip address/Hostname of the nodes are changed. ( The glusterd is not started. https://bugzilla.redhat.com/show_bug.cgi?id=1036551 ) 4. Remove the vols directory from "/var/lib/glusterd/vols" 5. from node1 : "gluster detach "node2_old_ip" force" and "gluster detach "node3_old_ip" force" 6. from node1 : gluster peer probe "node2_new_ip". Node1 puts Node2 in the "Accepted peer request (Connected)" State. Node1 sends the volume information to Node2 and Node2 updates this volume information. 7. from node2 : gluster detach "node3_old_ip" force . Actual results: ================ root@ip-10-114-246-246 [Dec-02-2013-10:37:40] >gluster peer detach 10.111.67.22 peer detach: failed: Brick(s) with the peer 10.111.67.22 exist in cluster Expected results: ================== Assertion message in the glusterd log file can be replaced by appropriate failure message i.e Unable to resolve brick path.
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.