1572534 – Unable to replace faulty brick

Bug 1572534 - Unable to replace faulty brick

Summary: Unable to replace faulty brick

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-27 09:12 UTC by Plazmus
Modified:	2018-06-20 18:29 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-06-20 18:29:39 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Plazmus 2018-04-27 09:12:18 UTC

Description of problem:

I have a cluster replica 3.
I terminated one node, built a new one and re-added it to the pool with:
gluster peer probe <new_node>

I started replacing failed bricks with (at that time 2 healthy nodes were quite busy, I had an rsync running):
gluster volume replace-brick vol_name <old_node>:/shares/red/brick <new_node>:/shares/red/brick commit force

Replacement of one of the bricks failed, but when I checked with:
gluster vol info

Brick looks like it has been replaced
Bricks:
Brick1: <healthy_node1>:/shares/red/brick
Brick2: <new_node2>:/shares/red/brick
Brick3: <healthy_node3:/shares/red/brick

I attempted to detach terminated node with:
gluster peer detach <faulty_node>

peer detach: failed: One of the peers is probably down. Check with 'peer status'


Steps to Reproduce:
It might be hard to reproduce as I've replaced nodes previously without any issues, it seems that load on the box prevented to complete brick replacement and now it's in a half broken state.

Any way to work around it?

Thanks

Comment 1 Plazmus 2018-04-27 11:14:00 UTC

I managed to resolve/work around it.
To fix it I had to add peer in /var/lib/glusterd/peers, which was present on healthy nodes, but for some reason missing on the new one.

After that I was able to replace-brick with a new one.

The only issue left is that I can't remove disconnected peers, still getting:

peer detach: failed: One of the peers is probably down. Check with 'peer status'

Comment 2 Shyamsundar 2018-06-20 18:29:39 UTC

This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.

Note You need to log in before you can comment on or make changes to this bug.