Bug 1473026 - replace-brick failure leaves glusterd in inconsistent state
Summary: replace-brick failure leaves glusterd in inconsistent state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-19 21:18 UTC by Raghavendra Talur
Modified: 2017-12-08 17:35 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.13.0
Clone Of:
Environment:
Last Closed: 2017-12-08 17:35:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Talur 2017-07-19 21:18:29 UTC
Description of problem:
In a replica 3 scenario, if one of the bricks is down for some reason and a replace brick is attempted for one of the other two bricks, then setting of trusted.replace-brick fails with read only error leaving glusterd in inconsistent state.

Comment 1 Worker Ant 2017-07-19 21:26:23 UTC
REVIEW: https://review.gluster.org/17828 (tests: replace brick failure shouldn't corrupt volfiles) posted (#1) for review on master by Raghavendra Talur (rtalur)

Comment 2 Ravishankar N 2017-07-20 05:00:29 UTC
Code snippet in case if replicate volume:

glusterd_op_perform_replace_brick()
{

   1. glusterd_op_perform_remove_brick();
   2. ret=glusterd_handle_replicate_brick_ops();
      if (ret) {
       goto out;
       }
   3. glusterd_create_volfiles_and_notify_services();

}

With the test mentioned in the bug description, ret value is non-zero in step-2. So glusterd would need to roll-back the changes it did before step-2. I.e. it needs to undo the remove-brick done in step-1.

Changing the component to glusterd based on the above.

Comment 3 Atin Mukherjee 2017-07-20 08:36:24 UTC
Ravi,

If we'd really need to fix this issue, then IMO replace brick should be implemented in mgmt_v3 framework as it provides a sort of rollback mechanism in the form of post commit phase (snapshot does use it in few areas). With GD2, infra for rolling back the transactions will be in place. But what ever be the case, GlusterD team wouldn't be implementing this change. So I'm moving back this bug to replicate.

Comment 4 Ravishankar N 2017-07-20 09:05:29 UTC
(In reply to Atin Mukherjee from comment #3)
> Ravi,
> 
> If we'd really need to fix this issue, then IMO replace brick should be
> implemented in mgmt_v3 framework as it provides a sort of rollback mechanism
> in the form of post commit phase (snapshot does use it in few areas). With
> GD2, infra for rolling back the transactions will be in place.

Okay, I think this bug can be taken in for glusterfs-4.0 using GD2. Pranith, feel free to add your thoughts if you feel otherwise.

Comment 5 Worker Ant 2017-07-24 07:27:13 UTC
COMMIT: https://review.gluster.org/17828 committed in master by Raghavendra Talur (rtalur) 
------
commit 668df4e7e452aa26f0e0fbd15691fab0edc83014
Author: Raghavendra Talur <rtalur>
Date:   Thu Jul 20 02:54:30 2017 +0530

    tests: replace brick failure shouldn't corrupt volfiles
    
    This is a test to present the known issue. It will be skipped as it has
    the known issue marker.
    
    Change-Id: Id6fa5d323abe0bc76a58cd92cb8e52fcde41b49b
    BUG: 1473026
    Signed-off-by: Raghavendra Talur <rtalur>
    Reviewed-on: https://review.gluster.org/17828
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 6 Shyamsundar 2017-12-08 17:35:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.