Bug 1572534

Summary: Unable to replace faulty brick
Product: [Community] GlusterFS Reporter: Plazmus <marcin.wyrembak>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 18:29:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Plazmus 2018-04-27 09:12:18 UTC
Description of problem:

I have a cluster replica 3.
I terminated one node, built a new one and re-added it to the pool with:
gluster peer probe <new_node>

I started replacing failed bricks with (at that time 2 healthy nodes were quite busy, I had an rsync running):
gluster volume replace-brick vol_name <old_node>:/shares/red/brick <new_node>:/shares/red/brick commit force

Replacement of one of the bricks failed, but when I checked with:
gluster vol info

Brick looks like it has been replaced
Bricks:
Brick1: <healthy_node1>:/shares/red/brick
Brick2: <new_node2>:/shares/red/brick
Brick3: <healthy_node3:/shares/red/brick

I attempted to detach terminated node with:
gluster peer detach <faulty_node>

peer detach: failed: One of the peers is probably down. Check with 'peer status'


Steps to Reproduce:
It might be hard to reproduce as I've replaced nodes previously without any issues, it seems that load on the box prevented to complete brick replacement and now it's in a half broken state.

Any way to work around it?

Thanks

Comment 1 Plazmus 2018-04-27 11:14:00 UTC
I managed to resolve/work around it.
To fix it I had to add peer in /var/lib/glusterd/peers, which was present on healthy nodes, but for some reason missing on the new one.

After that I was able to replace-brick with a new one.

The only issue left is that I can't remove disconnected peers, still getting:

peer detach: failed: One of the peers is probably down. Check with 'peer status'

Comment 2 Shyamsundar 2018-06-20 18:29:39 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.