+++ This bug was initially created as a clone of Bug #1278744 +++ +++ This bug was initially created as a clone of Bug #1276989 +++ Description of problem: if we run ec-readdir.t in a loop it fails. This only happens on mainline not on 3.7. So some regression is the reason. For now moving to bad-tests Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Vijay Bellur on 2015-11-01 21:35:01 EST --- REVIEW: http://review.gluster.org/12481 (tests: Move ec-readdir.t to bad tests) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Vijay Bellur on 2015-11-02 06:43:16 EST --- COMMIT: http://review.gluster.org/12481 committed in master by Raghavendra Talur (rtalur) ------ commit 56ccc0d2f4a30af9304852effbf2b68694d9f587 Author: Pranith Kumar K <pkarampu> Date: Mon Nov 2 07:56:51 2015 +0530 tests: Move ec-readdir.t to bad tests Change-Id: Ie7f6d25cbc617ff347aeb7d77fc0a60924c83f09 BUG: 1276989 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/12481 Tested-by: Raghavendra Talur <rtalur> Reviewed-by: Raghavendra Talur <rtalur> --- Additional comment from Vijay Bellur on 2015-11-11 08:48:30 EST --- COMMIT: http://review.gluster.org/12562 committed in release-3.7 by Xavier Hernandez (xhernandez) ------ commit 06b888bbeac61aa1234b43e398431529988c28b6 Author: Pranith Kumar K <pkarampu> Date: Tue Nov 10 09:06:54 2015 +0530 cluster/ec: fix bug in update_good Backport of http://review.gluster.com/12561 Problem: Bricks that didn't participate in the fops are considered to be good. This is happening two fold. Examples: Case-1: 1) 2+1 volume. 'd1' directory on Brick-0 is bad. 2) readdir takes locks and lock->good_mask is '7' 3) readdir does xattrop and fop->mask is '6'. 4) because fop->expected is '1' lock->good_mask remains '7' Case-2: 1) when all the bricks are up, it does lock + xattrop before op and figures out all the bricks are good. 2) By the time second operation starts brick-0 is down. Now lock->good_mask will always have the '0' bit set as long as the operations are happening on it. because: "lock->good_mask &= ~fop->mask | fop->remaining" fop->mask doesn't have '0' th bit. 3) When it comes time to perform the final xattrop in update_size_version brick-0 comes online because of which it gives the same version to brick-0 as well thinking it has participated in all the transactions till then, even when it didn't participate in the transactions. Fix: Case-1's fix: Update lock->good_mask in ec_prepare_update_cbk with latest good/bad bricks Case-2's fix: Consider non-participating brick as bad. BUG: 1278744 Change-Id: I5c2b07005107f3c067bac69da3b37ff39688bd69 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/12562 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Xavier Hernandez <xhernandez>
Have verified running ec-readdir.t for an hour and didn't see the failures. Marking this as fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html