1256265 – Data Loss:Remove brick commit passing when remove-brick process has not even started(due to killing glusterd)

Bug 1256265 - Data Loss:Remove brick commit passing when remove-brick process has not even started(due to killing glusterd)

Summary: Data Loss:Remove brick commit passing when remove-brick process has not even ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1236038 1245045
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-24 08:46 UTC by Atin Mukherjee
Modified:	2015-09-09 09:40 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.4
Clone Of:	1245045
Environment:
Last Closed:	2015-09-09 09:40:22 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Anand Avati 2015-08-24 08:47:07 UTC

REVIEW: http://review.gluster.org/11996 (glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick) posted (#1) for review on release-3.7 by Atin Mukherjee (amukherj)

Comment 2 Anand Avati 2015-08-28 09:02:53 UTC

COMMIT: http://review.gluster.org/11996 committed in release-3.7 by Kaushal M (kaushal) 
------
commit f51ffaeda4c87b682b7865c26befd75fe1c8cb25
Author: Atin Mukherjee <amukherj>
Date:   Tue Jul 21 09:57:43 2015 +0530

    glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick
    
    Backport of http://review.gluster.org/#/c/11726/
    
    remove brick stage blindly starts the remove brick operation even if the
    glusterd instance of the node hosting the brick is down. Operationally its
    incorrect and this could result into a inconsistent rebalance status across all
    the nodes as the originator of this command will always have the rebalance
    status to 'DEFRAG_NOT_STARTED', however when the glusterd instance on the other
    nodes comes up, will trigger rebalance and make the status to completed once the
    rebalance is finished.
    
    This patch fixes two things:
    1. Add a validation in remove brick to check whether all the peers hosting the
    bricks to be removed are up.
    
    2. Don't copy volinfo->rebal.dict from stale volinfo during restore as this
    might end up in a incosistent node_state.info file resulting into volume status
    command failure.
    
    Change-Id: Ia4a76865c05037d49eec5e3bbfaf68c1567f1f81
    BUG: 1256265
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/11726
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/11996
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Kaushal 2015-09-09 09:40:22 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.