Bug 980548 - intermittent failures of tests/bugs/bug-888174.t
Summary: intermittent failures of tests/bugs/bug-888174.t
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 985386
TreeView+ depends on / blocked
 
Reported: 2013-07-02 16:56 UTC by Pranith Kumar K
Modified: 2013-08-13 12:26 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 985386 (view as bug list)
Environment:
Last Closed: 2013-07-24 17:57:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2013-07-02 16:56:11 UTC
Description of problem:
If we run tests/bugs/bug-888174.t of test framework in a while loop it fails intermittently.

Results of failed run:

Test Summary Report
-------------------
./tests/bugs/bug-888174.t                       (Wstat: 0 Tests: 25 Failed: 4)
  Failed tests:  22-25

Version-Release number of selected component (if applicable):


How reproducible:
intermittent

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2013-07-02 17:08:38 UTC
REVIEW: http://review.gluster.org/5274 (cluster/afr: Make sure flush is unwound after post-op) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2013-07-03 00:30:35 UTC
REVIEW: http://review.gluster.org/5274 (cluster/afr: post-op should complete before starting flush) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2013-07-03 07:40:48 UTC
COMMIT: http://review.gluster.org/5274 committed in master by Vijay Bellur (vbellur) 
------
commit 29619b4ee78926160435da82f9db213161e040d4
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 3 05:23:46 2013 +0530

    cluster/afr: post-op should complete before starting flush
    
    Problem:
    At the moment afr-flush makes sure that a delayed post-op
    is woken up but it does not wait for it to complete the
    post-op before flush unwinds.
    These are the steps that are happening:
    1) flush fop comes on an fd which wakes up a delayed post-op
    and continues with the flush fop.
    2) post-op sends fsync on the wire.
    3) flush completes and unwinds to fuse.
    4) graph switch happens on the fuse mount disconnecting the
    old graph's client connections to bricks.
    5) xattrop after fsync fails with ENOTCONN because the
    connections from old graph are taken down now.
    
    Fix:
    Wait for post-op to complete before starting to flush.
    We could make flush act similar to fsync (i.e.) wind
    flush as is but wait for post-op to complete before unwinding
    flush, but it is better to send flush as the final fop. So
    wind of flush will start after post-op is complete. Had to
    change fsync to accommodate this change.
    
    Change-Id: I93aa642647751969511718b0e137afbd067b388a
    BUG: 980548
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/5274
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 4 Anand Avati 2013-08-13 12:26:23 UTC
REVIEW: http://review.gluster.org/5599 (mount/fuse: Add artifical delay before sending PARENT_DOWN) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)


Note You need to log in before you can comment on or make changes to this bug.