1443501 – Don't wind post-op on a brick where the fop phase failed.

Bug 1443501 - Don't wind post-op on a brick where the fop phase failed.

Summary: Don't wind post-op on a brick where the fop phase failed.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1438255
Blocks:	1394118 glusterfs-3.10.2
TreeView+	depends on / blocked

Reported:	2017-04-19 11:30 UTC by Ravishankar N
Modified:	2017-05-31 20:46 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glusterfs-3.10.2
Clone Of:	1438255
Environment:
Last Closed:	2017-05-31 20:46:29 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2017-04-19 11:30:41 UTC

+++ This bug was initially created as a clone of Bug #1438255 +++

Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.

    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.

--- Additional comment from Worker Ant on 2017-04-02 09:12:51 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-05 00:49:26 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-10 07:37:37 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#3) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-12 12:57:36 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#4) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-14 06:38:17 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#5) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-17 01:57:00 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#6) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-18 22:29:33 EDT ---

COMMIT: https://review.gluster.org/16976 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 10dad995c989e9d77c341135d7c48817baba966c
Author: Ravishankar N <ravishankar>
Date:   Sun Apr 2 18:08:04 2017 +0530

    afr: don't do a post-op on a brick if op failed
    
    Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.
    
    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.
    
    Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
    BUG: 1438255
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: https://review.gluster.org/16976
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 1 Worker Ant 2017-04-19 11:31:47 UTC

REVIEW: https://review.gluster.org/17083 (afr: don't do a post-op on a brick if op failed) posted (#1) for review on release-3.10 by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2017-04-27 10:46:58 UTC

COMMIT: https://review.gluster.org/17083 committed in release-3.10 by Raghavendra Talur (rtalur) 
------
commit a8d293c361fb3b0daa2a83032f3b87e89a46021d
Author: Ravishankar N <ravishankar>
Date:   Sun Apr 2 18:08:04 2017 +0530

    afr: don't do a post-op on a brick if op failed
    
    Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.
    
    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.
    
    > Reviewed-on: https://review.gluster.org/16976
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    (cherry picked from commit 10dad995c989e9d77c341135d7c48817baba966c)
    
    Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
    BUG: 1443501
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: https://review.gluster.org/17083
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Raghavendra Talur 2017-05-31 20:46:29 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.2, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.