Bug 1256265 - Data Loss:Remove brick commit passing when remove-brick process has not even started(due to killing glusterd)
Summary: Data Loss:Remove brick commit passing when remove-brick process has not even ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.7.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On: 1236038 1245045
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-24 08:46 UTC by Atin Mukherjee
Modified: 2015-09-09 09:40 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.4
Clone Of: 1245045
Environment:
Last Closed: 2015-09-09 09:40:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Anand Avati 2015-08-24 08:47:07 UTC
REVIEW: http://review.gluster.org/11996 (glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick) posted (#1) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 2 Anand Avati 2015-08-28 09:02:53 UTC
COMMIT: http://review.gluster.org/11996 committed in release-3.7 by Kaushal M (kaushal) 
------
commit f51ffaeda4c87b682b7865c26befd75fe1c8cb25
Author: Atin Mukherjee <amukherj>
Date:   Tue Jul 21 09:57:43 2015 +0530

    glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick
    
    Backport of http://review.gluster.org/#/c/11726/
    
    remove brick stage blindly starts the remove brick operation even if the
    glusterd instance of the node hosting the brick is down. Operationally its
    incorrect and this could result into a inconsistent rebalance status across all
    the nodes as the originator of this command will always have the rebalance
    status to 'DEFRAG_NOT_STARTED', however when the glusterd instance on the other
    nodes comes up, will trigger rebalance and make the status to completed once the
    rebalance is finished.
    
    This patch fixes two things:
    1. Add a validation in remove brick to check whether all the peers hosting the
    bricks to be removed are up.
    
    2. Don't copy volinfo->rebal.dict from stale volinfo during restore as this
    might end up in a incosistent node_state.info file resulting into volume status
    command failure.
    
    Change-Id: Ia4a76865c05037d49eec5e3bbfaf68c1567f1f81
    BUG: 1256265
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/11726
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/11996
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Kaushal 2015-09-09 09:40:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.