Description of problem: In a replica 3 scenario, if one of the bricks is down for some reason and a replace brick is attempted for one of the other two bricks, then setting of trusted.replace-brick fails with read only error leaving glusterd in inconsistent state.
REVIEW: https://review.gluster.org/17828 (tests: replace brick failure shouldn't corrupt volfiles) posted (#1) for review on master by Raghavendra Talur (rtalur)
Code snippet in case if replicate volume: glusterd_op_perform_replace_brick() { 1. glusterd_op_perform_remove_brick(); 2. ret=glusterd_handle_replicate_brick_ops(); if (ret) { goto out; } 3. glusterd_create_volfiles_and_notify_services(); } With the test mentioned in the bug description, ret value is non-zero in step-2. So glusterd would need to roll-back the changes it did before step-2. I.e. it needs to undo the remove-brick done in step-1. Changing the component to glusterd based on the above.
Ravi, If we'd really need to fix this issue, then IMO replace brick should be implemented in mgmt_v3 framework as it provides a sort of rollback mechanism in the form of post commit phase (snapshot does use it in few areas). With GD2, infra for rolling back the transactions will be in place. But what ever be the case, GlusterD team wouldn't be implementing this change. So I'm moving back this bug to replicate.
(In reply to Atin Mukherjee from comment #3) > Ravi, > > If we'd really need to fix this issue, then IMO replace brick should be > implemented in mgmt_v3 framework as it provides a sort of rollback mechanism > in the form of post commit phase (snapshot does use it in few areas). With > GD2, infra for rolling back the transactions will be in place. Okay, I think this bug can be taken in for glusterfs-4.0 using GD2. Pranith, feel free to add your thoughts if you feel otherwise.
COMMIT: https://review.gluster.org/17828 committed in master by Raghavendra Talur (rtalur) ------ commit 668df4e7e452aa26f0e0fbd15691fab0edc83014 Author: Raghavendra Talur <rtalur> Date: Thu Jul 20 02:54:30 2017 +0530 tests: replace brick failure shouldn't corrupt volfiles This is a test to present the known issue. It will be skipped as it has the known issue marker. Change-Id: Id6fa5d323abe0bc76a58cd92cb8e52fcde41b49b BUG: 1473026 Signed-off-by: Raghavendra Talur <rtalur> Reviewed-on: https://review.gluster.org/17828 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar> Reviewed-by: Atin Mukherjee <amukherj>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report. glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html [2] https://www.gluster.org/pipermail/gluster-users/