Bug 1256265

Summary: Data Loss:Remove brick commit passing when remove-brick process has not even started(due to killing glusterd)
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.7.3CC: bugs, gluster-bugs, kaushal, nbalacha, nchilaka, nsathyan, rhs-bugs, sabansal, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1245045 Environment:
Last Closed: 2015-09-09 09:40:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1236038, 1245045    
Bug Blocks:    

Comment 1 Anand Avati 2015-08-24 08:47:07 UTC
REVIEW: http://review.gluster.org/11996 (glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick) posted (#1) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 2 Anand Avati 2015-08-28 09:02:53 UTC
COMMIT: http://review.gluster.org/11996 committed in release-3.7 by Kaushal M (kaushal) 
------
commit f51ffaeda4c87b682b7865c26befd75fe1c8cb25
Author: Atin Mukherjee <amukherj>
Date:   Tue Jul 21 09:57:43 2015 +0530

    glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick
    
    Backport of http://review.gluster.org/#/c/11726/
    
    remove brick stage blindly starts the remove brick operation even if the
    glusterd instance of the node hosting the brick is down. Operationally its
    incorrect and this could result into a inconsistent rebalance status across all
    the nodes as the originator of this command will always have the rebalance
    status to 'DEFRAG_NOT_STARTED', however when the glusterd instance on the other
    nodes comes up, will trigger rebalance and make the status to completed once the
    rebalance is finished.
    
    This patch fixes two things:
    1. Add a validation in remove brick to check whether all the peers hosting the
    bricks to be removed are up.
    
    2. Don't copy volinfo->rebal.dict from stale volinfo during restore as this
    might end up in a incosistent node_state.info file resulting into volume status
    command failure.
    
    Change-Id: Ia4a76865c05037d49eec5e3bbfaf68c1567f1f81
    BUG: 1256265
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/11726
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/11996
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Kaushal 2015-09-09 09:40:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user