Description of problem: I have a cluster replica 3. I terminated one node, built a new one and re-added it to the pool with: gluster peer probe <new_node> I started replacing failed bricks with (at that time 2 healthy nodes were quite busy, I had an rsync running): gluster volume replace-brick vol_name <old_node>:/shares/red/brick <new_node>:/shares/red/brick commit force Replacement of one of the bricks failed, but when I checked with: gluster vol info Brick looks like it has been replaced Bricks: Brick1: <healthy_node1>:/shares/red/brick Brick2: <new_node2>:/shares/red/brick Brick3: <healthy_node3:/shares/red/brick I attempted to detach terminated node with: gluster peer detach <faulty_node> peer detach: failed: One of the peers is probably down. Check with 'peer status' Steps to Reproduce: It might be hard to reproduce as I've replaced nodes previously without any issues, it seems that load on the box prevented to complete brick replacement and now it's in a half broken state. Any way to work around it? Thanks
I managed to resolve/work around it. To fix it I had to add peer in /var/lib/glusterd/peers, which was present on healthy nodes, but for some reason missing on the new one. After that I was able to replace-brick with a new one. The only issue left is that I can't remove disconnected peers, still getting: peer detach: failed: One of the peers is probably down. Check with 'peer status'
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.