Bug 1256265

Summary:	Data Loss:Remove brick commit passing when remove-brick process has not even started(due to killing glusterd)
Product:	[Community] GlusterFS	Reporter:	Atin Mukherjee <amukherj>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	3.7.3	CC:	bugs, gluster-bugs, kaushal, nbalacha, nchilaka, nsathyan, rhs-bugs, sabansal, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.4	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1245045	Environment:
Last Closed:	2015-09-09 09:40:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1236038, 1245045
Bug Blocks:

Comment 1 Anand Avati 2015-08-24 08:47:07 UTC

REVIEW: http://review.gluster.org/11996 (glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick) posted (#1) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 2 Anand Avati 2015-08-28 09:02:53 UTC

COMMIT: http://review.gluster.org/11996 committed in release-3.7 by Kaushal M (kaushal) 
------
commit f51ffaeda4c87b682b7865c26befd75fe1c8cb25
Author: Atin Mukherjee <amukherj>
Date:   Tue Jul 21 09:57:43 2015 +0530

    glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick
    
    Backport of http://review.gluster.org/#/c/11726/
    
    remove brick stage blindly starts the remove brick operation even if the
    glusterd instance of the node hosting the brick is down. Operationally its
    incorrect and this could result into a inconsistent rebalance status across all
    the nodes as the originator of this command will always have the rebalance
    status to 'DEFRAG_NOT_STARTED', however when the glusterd instance on the other
    nodes comes up, will trigger rebalance and make the status to completed once the
    rebalance is finished.
    
    This patch fixes two things:
    1. Add a validation in remove brick to check whether all the peers hosting the
    bricks to be removed are up.
    
    2. Don't copy volinfo->rebal.dict from stale volinfo during restore as this
    might end up in a incosistent node_state.info file resulting into volume status
    command failure.
    
    Change-Id: Ia4a76865c05037d49eec5e3bbfaf68c1567f1f81
    BUG: 1256265
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/11726
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/11996
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Kaushal 2015-09-09 09:40:22 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user