Bug 1023865

Summary: Unable to detach a peer from the trusted storage pool when any one of the peer is "Disconnected"
Product: Red Hat Gluster Storage Reporter: spandura
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED EOL QA Contact: spandura
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: amukherj, mark.shine, nerawat, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: glusterd
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:11:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description spandura 2013-10-28 07:23:39 UTC
Description of problem:
=========================
Consider a use case where we have 6 storage nodes and 2 x 3 distribute-replicate volume with 1 brick on each node. 2 nodes in the cluster crashed. We have to replace the bricks. 

Following are the steps suggested to replace the bricks. ( http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/sect-User_Guide-Managing_Volumes-Migrating.html )  

1. Add 2 new nodes into the trusted storage pool. 

2. Replace the bricks using the command "gluster volume replace-brick VOLNAME BRICK NEW-BRICK commit force" 

3. If replace-brick is successful, detach the nodes that was replaced from the trusted storage pool. ( http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/ch07s02.html )

We are not able to detach the nodes which are dead from the trusted storage pool even though they are not part of any volume. This is because "peer detach" fails  when any of the peer in trusted storage pool is in "Disconnected" state. 

Version-Release number of selected component (if applicable):
===============================================================
glusterfs 3.4.0.36rhs built on Oct 22 2013 10:56:18

How reproducible:
====================
Often

Steps to Reproduce:
=====================
1. Add 6 servers to the trusted storage pool.

2. stop glusterd on a nodes. 

3. execute "peer detach" on any of the node 

Actual results:
==================
peer detach: failed: One of the peers is probably down. Check with 'peer status'

Expected results:
===================
Detaching the peer should be handled gracefully.

Comment 2 Vivek Agarwal 2015-12-03 17:11:54 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Comment 3 MShine 2016-07-12 11:04:52 UTC
I have reproduced the same error on glusterfs 3.7 exactly the same issue.

Comment 4 Atin Mukherjee 2016-07-12 11:23:03 UTC
If the dead peer doesn't host any of the volumes (bricks to be precise), you should be able to detach it.

Comment 5 MShine 2016-07-12 12:19:57 UTC
I agree, how but I was getting the same error as above. How I solved the issue was by going into /var/lib/glusterfs/peer/ and removing all files here. Then when I restarted the service the peers where gone. This was the only way as I tried every scenario and I kept getting: "peer detach: failed: One of the peers is probably down. Check with 'peer status'"

This was a fresh install on ubuntu 14.04 running within my test environment within VMware ESXi 6.0u2

the gluster peer detach <HOSTNAME OR IP> has never worked. The only action I took before trying to detach was installing glusterfs from ppa and peer prob. 

Anyways found my own work around.

Thank you for your reply.