1280410 – ec-readdir.t is failing consistently

Bug 1280410 - ec-readdir.t is failing consistently

Summary: ec-readdir.t is failing consistently

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	Bug Updates Notification Mailing List
QA Contact:	Bhaskarakiran
Docs Contact:
URL:
Whiteboard:
Depends On:	1276989 1278744
Blocks:	1260783 1278539 glusterfs-3.7.7
TreeView+	depends on / blocked

Reported:	2015-11-11 16:56 UTC by Pranith Kumar K
Modified:	2016-11-23 23:12 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.5-7
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1278744
Environment:
Last Closed:	2016-03-01 05:54:29 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description Pranith Kumar K 2015-11-11 16:56:42 UTC

+++ This bug was initially created as a clone of Bug #1278744 +++

+++ This bug was initially created as a clone of Bug #1276989 +++

Description of problem:
if we run ec-readdir.t in a loop it fails. This only happens on mainline not on 3.7. So some regression is the reason. For now moving to bad-tests

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2015-11-01 21:35:01 EST ---

REVIEW: http://review.gluster.org/12481 (tests: Move ec-readdir.t to bad tests) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2015-11-02 06:43:16 EST ---

COMMIT: http://review.gluster.org/12481 committed in master by Raghavendra Talur (rtalur) 
------
commit 56ccc0d2f4a30af9304852effbf2b68694d9f587
Author: Pranith Kumar K <pkarampu>
Date:   Mon Nov 2 07:56:51 2015 +0530

    tests: Move ec-readdir.t to bad tests
    
    Change-Id: Ie7f6d25cbc617ff347aeb7d77fc0a60924c83f09
    BUG: 1276989
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/12481
    Tested-by: Raghavendra Talur <rtalur>
    Reviewed-by: Raghavendra Talur <rtalur>

--- Additional comment from Vijay Bellur on 2015-11-11 08:48:30 EST ---

COMMIT: http://review.gluster.org/12562 committed in release-3.7 by Xavier Hernandez (xhernandez) 
------
commit 06b888bbeac61aa1234b43e398431529988c28b6
Author: Pranith Kumar K <pkarampu>
Date:   Tue Nov 10 09:06:54 2015 +0530

    cluster/ec: fix bug in update_good
    
            Backport of http://review.gluster.com/12561
    
    Problem:
    Bricks that didn't participate in the fops are considered to be good. This is
    happening two fold.
    
    Examples:
    Case-1:
    1) 2+1 volume. 'd1' directory on Brick-0 is bad.
    2) readdir takes locks and lock->good_mask is '7'
    3) readdir does xattrop and fop->mask is '6'.
    4) because fop->expected is '1' lock->good_mask remains '7'
    
    Case-2:
    1) when all the bricks are up, it does lock + xattrop before op and figures out
       all the bricks are good.
    2) By the time second operation starts brick-0 is down. Now lock->good_mask
       will always have the '0' bit set as long as the operations are happening on it.
       because: "lock->good_mask &= ~fop->mask | fop->remaining" fop->mask doesn't
       have '0' th bit.
    3) When it comes time to perform the final xattrop in update_size_version
       brick-0 comes online because of which it gives the same version to brick-0
       as well thinking it has participated in all the transactions till then, even
       when it didn't participate in the transactions.
    
    Fix:
    Case-1's fix: Update lock->good_mask in ec_prepare_update_cbk with latest
    good/bad bricks
    Case-2's fix: Consider non-participating brick as bad.
    
    BUG: 1278744
    Change-Id: I5c2b07005107f3c067bac69da3b37ff39688bd69
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/12562
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Xavier Hernandez <xhernandez>

Comment 4 Bhaskarakiran 2015-11-25 11:01:16 UTC

Have verified running ec-readdir.t for an hour and didn't see the failures. Marking this as fixed.

Comment 6 errata-xmlrpc 2016-03-01 05:54:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.