Bug 1443501 - Don't wind post-op on a brick where the fop phase failed.
Summary: Don't wind post-op on a brick where the fop phase failed.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.10
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On: 1438255
Blocks: 1394118 glusterfs-3.10.2
TreeView+ depends on / blocked
 
Reported: 2017-04-19 11:30 UTC by Ravishankar N
Modified: 2017-05-31 20:46 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.10.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1438255
Environment:
Last Closed: 2017-05-31 20:46:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Ravishankar N 2017-04-19 11:30:41 UTC
+++ This bug was initially created as a clone of Bug #1438255 +++

Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.

    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.

--- Additional comment from Worker Ant on 2017-04-02 09:12:51 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-05 00:49:26 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-10 07:37:37 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#3) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-12 12:57:36 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#4) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-14 06:38:17 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#5) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-17 01:57:00 EDT ---

REVIEW: https://review.gluster.org/16976 (afr: don't do a post-op on a brick if op failed) posted (#6) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-04-18 22:29:33 EDT ---

COMMIT: https://review.gluster.org/16976 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 10dad995c989e9d77c341135d7c48817baba966c
Author: Ravishankar N <ravishankar>
Date:   Sun Apr 2 18:08:04 2017 +0530

    afr: don't do a post-op on a brick if op failed
    
    Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.
    
    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.
    
    Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
    BUG: 1438255
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: https://review.gluster.org/16976
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 1 Worker Ant 2017-04-19 11:31:47 UTC
REVIEW: https://review.gluster.org/17083 (afr: don't do a post-op on a brick if op failed) posted (#1) for review on release-3.10 by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2017-04-27 10:46:58 UTC
COMMIT: https://review.gluster.org/17083 committed in release-3.10 by Raghavendra Talur (rtalur) 
------
commit a8d293c361fb3b0daa2a83032f3b87e89a46021d
Author: Ravishankar N <ravishankar>
Date:   Sun Apr 2 18:08:04 2017 +0530

    afr: don't do a post-op on a brick if op failed
    
    Problem:
    In afr-v2, self-blaming xattrs are not there by design. But if the FOP
    failed on a brick due to an error other than ENOTCONN (or even due to
    ENOTCONN, but we regained connection before postop was wound), we wind
    the post-op also on the failed brick, leading to setting self-blaming
    xattrs on that brick. This can lead to undesired results like healing of
    files in split-brain etc.
    
    Fix:
    If a fop failed on a brick on which pre-op was successful, do not
    perform post-op on it. This also produces the desired effect of not
    resetting the dirty xattr on the brick, which is how it should be
    because if the fop failed on a brick, there is no reason to clear the
    dirty bit which actually serves as an indication of the failure.
    
    > Reviewed-on: https://review.gluster.org/16976
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    (cherry picked from commit 10dad995c989e9d77c341135d7c48817baba966c)
    
    Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4
    BUG: 1443501
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: https://review.gluster.org/17083
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Raghavendra Talur 2017-05-31 20:46:29 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.2, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.