1443319 – Don't wind post-op on a brick where the fop phase failed.

Bug 1443319 - Don't wind post-op on a brick where the fop phase failed.

Summary: Don't wind post-op on a brick where the fop phase failed.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1438255
Blocks:	1394118 glusterfs-3.8.12
TreeView+	depends on / blocked

Reported:	2017-04-19 06:07 UTC by Ravishankar N
Modified:	2017-05-29 04:59 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.8.12
Clone Of:	1438255
Environment:
Last Closed:	2017-05-29 04:59:32 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2017-04-19 06:07:55 UTC

+++ This bug was initially created as a clone of Bug #1438255 +++

Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.

    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.

--- Additional comment from Worker Ant on 2017-04-02 09:12:51 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-05 00:49:26 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-10 07:37:37 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#3) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-12 12:57:36 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#4) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-14 06:38:17 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#5) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-17 01:57:00 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#6) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-18 22:29:33 EDT ---

COMMIT: https://review.gluster.org/16976 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 10dad995c989e9d77c341135d7c48817baba966c
Author: Ravishankar N <ravishankar>
Date:   Sun Apr 2 18:08:04 2017 +0530

    afr: don't do a post-op on a brick if op failed
    
    Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.
    
    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.
    
    Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
    BUG: 1438255
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: https://review.gluster.org/16976
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 1 Worker Ant 2017-04-19 11:29:05 UTC

REVIEW: https://review.gluster.org/17082 (afr: don't do a post-op on a brick if op failed) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2017-04-29 11:26:27 UTC

COMMIT: https://review.gluster.org/17082 committed in release-3.8 by Niels de Vos (ndevos) 
------
commit a6d313d12c98cf533c6bbb10f491dd2ec48ca89c
Author: Ravishankar N <ravishankar>
Date:   Wed Apr 19 16:40:05 2017 +0530

    afr: don't do a post-op on a brick if op failed
    
    Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.
    
    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.
    
    > Reviewed-on: https://review.gluster.org/16976
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    
    Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
    BUG: 1443319
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: https://review.gluster.org/17082
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Niels de Vos 2017-05-29 04:59:32 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.12, please open a new bug report.

glusterfs-3.8.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-May/000072.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.