Bug 1572534

Summary:	Unable to replace faulty brick
Product:	[Community] GlusterFS	Reporter:	Plazmus <marcin.wyrembak>
Component:	glusterd	Assignee:	bugs <bugs>
Status:	CLOSED EOL	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.0	CC:	bugs
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-06-20 18:29:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Plazmus 2018-04-27 09:12:18 UTC

Description of problem:

I have a cluster replica 3.
I terminated one node, built a new one and re-added it to the pool with:
gluster peer probe <new_node>

I started replacing failed bricks with (at that time 2 healthy nodes were quite busy, I had an rsync running):
gluster volume replace-brick vol_name <old_node>:/shares/red/brick <new_node>:/shares/red/brick commit force

Replacement of one of the bricks failed, but when I checked with:
gluster vol info

Brick looks like it has been replaced
Bricks:
Brick1: <healthy_node1>:/shares/red/brick
Brick2: <new_node2>:/shares/red/brick
Brick3: <healthy_node3:/shares/red/brick

I attempted to detach terminated node with:
gluster peer detach <faulty_node>

peer detach: failed: One of the peers is probably down. Check with 'peer status'


Steps to Reproduce:
It might be hard to reproduce as I've replaced nodes previously without any issues, it seems that load on the box prevented to complete brick replacement and now it's in a half broken state.

Any way to work around it?

Thanks

Comment 1 Plazmus 2018-04-27 11:14:00 UTC

I managed to resolve/work around it.
To fix it I had to add peer in /var/lib/glusterd/peers, which was present on healthy nodes, but for some reason missing on the new one.

After that I was able to replace-brick with a new one.

The only issue left is that I can't remove disconnected peers, still getting:

peer detach: failed: One of the peers is probably down. Check with 'peer status'

Comment 2 Shyamsundar 2018-06-20 18:29:39 UTC

This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.