Bug 1256265 - Data Loss:Remove brick commit passing when remove-brick process has not even started(due to killing glusterd)
Data Loss:Remove brick commit passing when remove-brick process has not even ...
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
Unspecified Unspecified
unspecified Severity urgent
: ---
: ---
Assigned To: Atin Mukherjee
: Triaged
Depends On: 1236038 1245045
  Show dependency treegraph
Reported: 2015-08-24 04:46 EDT by Atin Mukherjee
Modified: 2015-09-09 05:40 EDT (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.7.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1245045
Last Closed: 2015-09-09 05:40:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Comment 1 Anand Avati 2015-08-24 04:47:07 EDT
REVIEW: http://review.gluster.org/11996 (glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick) posted (#1) for review on release-3.7 by Atin Mukherjee (amukherj@redhat.com)
Comment 2 Anand Avati 2015-08-28 05:02:53 EDT
COMMIT: http://review.gluster.org/11996 committed in release-3.7 by Kaushal M (kaushal@redhat.com) 
commit f51ffaeda4c87b682b7865c26befd75fe1c8cb25
Author: Atin Mukherjee <amukherj@redhat.com>
Date:   Tue Jul 21 09:57:43 2015 +0530

    glusterd: Don't allow remove brick start/commit if glusterd is down of the host of the brick
    Backport of http://review.gluster.org/#/c/11726/
    remove brick stage blindly starts the remove brick operation even if the
    glusterd instance of the node hosting the brick is down. Operationally its
    incorrect and this could result into a inconsistent rebalance status across all
    the nodes as the originator of this command will always have the rebalance
    status to 'DEFRAG_NOT_STARTED', however when the glusterd instance on the other
    nodes comes up, will trigger rebalance and make the status to completed once the
    rebalance is finished.
    This patch fixes two things:
    1. Add a validation in remove brick to check whether all the peers hosting the
    bricks to be removed are up.
    2. Don't copy volinfo->rebal.dict from stale volinfo during restore as this
    might end up in a incosistent node_state.info file resulting into volume status
    command failure.
    Change-Id: Ia4a76865c05037d49eec5e3bbfaf68c1567f1f81
    BUG: 1256265
    Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
    Reviewed-on: http://review.gluster.org/11726
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: N Balachandran <nbalacha@redhat.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
    Reviewed-on: http://review.gluster.org/11996
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Kaushal M <kaushal@redhat.com>
Comment 3 Kaushal 2015-09-09 05:40:22 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.