Bug 980548

Summary: intermittent failures of tests/bugs/bug-888174.t
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 985386 (view as bug list) Environment:
Last Closed: 2013-07-24 17:57:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 985386    

Description Pranith Kumar K 2013-07-02 16:56:11 UTC
Description of problem:
If we run tests/bugs/bug-888174.t of test framework in a while loop it fails intermittently.

Results of failed run:

Test Summary Report
-------------------
./tests/bugs/bug-888174.t                       (Wstat: 0 Tests: 25 Failed: 4)
  Failed tests:  22-25

Version-Release number of selected component (if applicable):


How reproducible:
intermittent

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2013-07-02 17:08:38 UTC
REVIEW: http://review.gluster.org/5274 (cluster/afr: Make sure flush is unwound after post-op) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2013-07-03 00:30:35 UTC
REVIEW: http://review.gluster.org/5274 (cluster/afr: post-op should complete before starting flush) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2013-07-03 07:40:48 UTC
COMMIT: http://review.gluster.org/5274 committed in master by Vijay Bellur (vbellur) 
------
commit 29619b4ee78926160435da82f9db213161e040d4
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 3 05:23:46 2013 +0530

    cluster/afr: post-op should complete before starting flush
    
    Problem:
    At the moment afr-flush makes sure that a delayed post-op
    is woken up but it does not wait for it to complete the
    post-op before flush unwinds.
    These are the steps that are happening:
    1) flush fop comes on an fd which wakes up a delayed post-op
    and continues with the flush fop.
    2) post-op sends fsync on the wire.
    3) flush completes and unwinds to fuse.
    4) graph switch happens on the fuse mount disconnecting the
    old graph's client connections to bricks.
    5) xattrop after fsync fails with ENOTCONN because the
    connections from old graph are taken down now.
    
    Fix:
    Wait for post-op to complete before starting to flush.
    We could make flush act similar to fsync (i.e.) wind
    flush as is but wait for post-op to complete before unwinding
    flush, but it is better to send flush as the final fop. So
    wind of flush will start after post-op is complete. Had to
    change fsync to accommodate this change.
    
    Change-Id: I93aa642647751969511718b0e137afbd067b388a
    BUG: 980548
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/5274
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 4 Anand Avati 2013-08-13 12:26:23 UTC
REVIEW: http://review.gluster.org/5599 (mount/fuse: Add artifical delay before sending PARENT_DOWN) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)