Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1572534 - Unable to replace faulty brick
Summary: Unable to replace faulty brick
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 4.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-27 09:12 UTC by Plazmus
Modified: 2018-06-20 18:29 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-20 18:29:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Plazmus 2018-04-27 09:12:18 UTC
Description of problem:

I have a cluster replica 3.
I terminated one node, built a new one and re-added it to the pool with:
gluster peer probe <new_node>

I started replacing failed bricks with (at that time 2 healthy nodes were quite busy, I had an rsync running):
gluster volume replace-brick vol_name <old_node>:/shares/red/brick <new_node>:/shares/red/brick commit force

Replacement of one of the bricks failed, but when I checked with:
gluster vol info

Brick looks like it has been replaced
Bricks:
Brick1: <healthy_node1>:/shares/red/brick
Brick2: <new_node2>:/shares/red/brick
Brick3: <healthy_node3:/shares/red/brick

I attempted to detach terminated node with:
gluster peer detach <faulty_node>

peer detach: failed: One of the peers is probably down. Check with 'peer status'


Steps to Reproduce:
It might be hard to reproduce as I've replaced nodes previously without any issues, it seems that load on the box prevented to complete brick replacement and now it's in a half broken state.

Any way to work around it?

Thanks

Comment 1 Plazmus 2018-04-27 11:14:00 UTC
I managed to resolve/work around it.
To fix it I had to add peer in /var/lib/glusterd/peers, which was present on healthy nodes, but for some reason missing on the new one.

After that I was able to replace-brick with a new one.

The only issue left is that I can't remove disconnected peers, still getting:

peer detach: failed: One of the peers is probably down. Check with 'peer status'

Comment 2 Shyamsundar 2018-06-20 18:29:39 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.


Note You need to log in before you can comment on or make changes to this bug.