+++ This bug was initially created as a clone of Bug #1433571 +++ Description of problem: While doing a conservative merge even if the brick is down the pending xattrs on that brick will be removed. This will lead to data loss situations. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: 1. Create a 3 way replicated volume and set cluster.quorum-type to none 2. Bring bricks b1 & b2 down, and create file f1 3. Bring b3 down & b1 up, and create file f2 4. Bring b1 down & b2 up, and create file f3 5. Bring b3 up, shd will do the conservative merge and create f1 on b2 and f3 on b3 and undo pending xattr on b1 as well 6. Bring b1 up. Now b1 blames b2 & b3. b2 & b3 does not blame b1. As part of the heal, b1 is considered as the source, and deletes f1 and f3 from both b2 & b3 leading to data loss Actual results: Pending xattrs are being reset for bricks which are down as well Expected results: Undo pending should reset the values of only the bricks which are up Additional info: --- Additional comment from Worker Ant on 2017-03-18 04:23:19 EDT --- REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#1) for review on master by Karthik U S (ksubrahm) --- Additional comment from Worker Ant on 2017-03-19 09:17:56 EDT --- REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#2) for review on master by Karthik U S (ksubrahm) --- Additional comment from Worker Ant on 2017-03-20 02:13:06 EDT --- REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#3) for review on master by Karthik U S (ksubrahm) --- Additional comment from Worker Ant on 2017-03-20 08:19:35 EDT --- REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#4) for review on master by Karthik U S (ksubrahm) --- Additional comment from Worker Ant on 2017-03-27 00:27:03 EDT --- REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2017-03-27 00:52:42 EDT --- REVIEW: https://review.gluster.org/16913 (cluster/afr: Undo pending xattrs only on the up bricks) posted (#6) for review on master by Karthik U S (ksubrahm) --- Additional comment from Worker Ant on 2017-03-27 05:52:34 EDT --- COMMIT: https://review.gluster.org/16913 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit f91596e6566c605e70a31a60523d11f78a097c3c Author: karthik-us <ksubrahm> Date: Sat Mar 18 13:44:56 2017 +0530 cluster/afr: Undo pending xattrs only on the up bricks Problem: While doing conservative merge, even if a brick is down, it will reset the pending xattr on that. When that brick comes up, as part of the heal, it will consider this brick as the source and removes the entries on the other bricks, which leads to data loss. Fix: Undo pending only for the bricks which are up. Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303 BUG: 1433571 Signed-off-by: karthik-us <ksubrahm> Reviewed-on: https://review.gluster.org/16913 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar>
upstream patch : https://review.gluster.org/16913
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102101/
qa validation, have run above the case and found it fixed now Steps: 1. Create a 3 way replicated volume and set cluster.quorum-type to none 2. Bring bricks b1 & b2 down, and create file f1 3. Bring b3 down & b1 up, and create file f2 4. Bring b1 down & b2 up, and create file f3 5. Bring b3 up, shd will do the conservative merge and create f1 on b2 and f3 on b3 Now what is fixed: post b2 and b3 up, I do not see that the pending xattr of b1(on b2 and b3 bricks which are up) being removed. It still exists which is the fix That also means post I Bring b1 up, I am not seeing Now b1 blaming b2 & b3. hence not leading to data loss. Also I tested one case with renames, for which i raised a bug 1463628 - file renames can lead to duplicate entries during a conservative merge which also means two files having same gfids However the above bz i raised is not because of this fix hence moving to verified testversion:3.8.4-28 on el7.4beta
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774