Bug 1437773 - Undo pending xattrs only on the up bricks
Summary: Undo pending xattrs only on the up bricks
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.3.0
Assignee: Karthik U S
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On: 1433571
Blocks: 1417151 1436203 1436231
TreeView+ depends on / blocked
 
Reported: 2017-03-31 07:07 UTC by Karthik U S
Modified: 2017-09-21 04:35 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.8.4-21
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1433571
Environment:
Last Closed: 2017-09-21 04:35:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Karthik U S 2017-03-31 07:07:48 UTC
+++ This bug was initially created as a clone of Bug #1433571 +++

Description of problem:
While doing a conservative merge even if the brick is down the pending xattrs on that brick will be removed. This will lead to data loss situations. 


Version-Release number of selected component (if applicable):
mainline


How reproducible:
Always

Steps to Reproduce:
1. Create a 3 way replicated volume and set cluster.quorum-type to none
2. Bring bricks b1 & b2 down, and create file f1
3. Bring b3 down & b1 up, and create file f2
4. Bring b1 down & b2 up, and create file f3
5. Bring b3 up, shd will do the conservative merge and create f1 on b2 and f3 on  b3 and undo pending xattr on b1 as well
6. Bring b1 up. Now b1 blames b2 & b3. b2 & b3 does not blame b1. As part of the heal, b1 is considered as the source, and deletes f1 and f3 from both b2 & b3 leading to data loss

Actual results:
Pending xattrs are being reset for bricks which are down as well

Expected results:
Undo pending should reset the values of only the bricks which are up

Additional info:

--- Additional comment from Worker Ant on 2017-03-18 04:23:19 EDT ---

REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#1) for review on master by Karthik U S (ksubrahm)

--- Additional comment from Worker Ant on 2017-03-19 09:17:56 EDT ---

REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#2) for review on master by Karthik U S (ksubrahm)

--- Additional comment from Worker Ant on 2017-03-20 02:13:06 EDT ---

REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#3) for review on master by Karthik U S (ksubrahm)

--- Additional comment from Worker Ant on 2017-03-20 08:19:35 EDT ---

REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#4) for review on master by Karthik U S (ksubrahm)

--- Additional comment from Worker Ant on 2017-03-27 00:27:03 EDT ---

REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Worker Ant on 2017-03-27 00:52:42 EDT ---

REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#6) for review on master by Karthik U S (ksubrahm)

--- Additional comment from Worker Ant on 2017-03-27 05:52:34 EDT ---

COMMIT: https://review.gluster.org/16913 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit f91596e6566c605e70a31a60523d11f78a097c3c
Author: karthik-us <ksubrahm>
Date:   Sat Mar 18 13:44:56 2017 +0530

    cluster/afr: Undo pending xattrs only on the up bricks
    
    Problem:
    While doing conservative merge, even if a brick is down, it will reset
    the pending xattr on that. When that brick comes up, as part of the
    heal, it will consider this brick as the source and removes the entries
    on the other bricks, which leads to data loss.
    
    Fix:
    Undo pending only for the bricks which are up.
    
    Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303
    BUG: 1433571
    Signed-off-by: karthik-us <ksubrahm>
    Reviewed-on: https://review.gluster.org/16913
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>

Comment 2 Atin Mukherjee 2017-03-31 07:51:41 UTC
upstream patch : https://review.gluster.org/16913

Comment 3 Karthik U S 2017-03-31 10:15:30 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102101/

Comment 6 Atin Mukherjee 2017-04-04 07:26:39 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102101/

Comment 8 Nag Pavan Chilakam 2017-06-21 11:37:20 UTC
qa validation, have run above the case and found it fixed now

Steps:
1. Create a 3 way replicated volume and set cluster.quorum-type to none
2. Bring bricks b1 & b2 down, and create file f1
3. Bring b3 down & b1 up, and create file f2
4. Bring b1 down & b2 up, and create file f3
5. Bring b3 up, shd will do the conservative merge and create f1 on b2 and f3 on  b3 

Now what is fixed:
post b2 and b3 up, I do not see that the pending xattr of b1(on b2 and b3 bricks which are up) being removed. It still exists which is the fix 
That also means post I  Bring b1 up, I am not seeing Now b1 blaming b2 & b3. hence not leading to data loss.

Also I tested one case with renames, for which i raised a bug 1463628 - file renames can lead to duplicate entries during a conservative merge which also means two files having same gfids 

However the above bz i raised is not because of this fix

hence moving to verified

testversion:3.8.4-28 on el7.4beta

Comment 10 errata-xmlrpc 2017-09-21 04:35:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.