Bug 985386

Summary: intermittent failures of tests/bugs/bug-888174.t
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Pranith Kumar K <pkarampu>
Component: glusterfsAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED EOL QA Contact: Pranith Kumar K <pkarampu>
Severity: unspecified Docs Contact:
Priority: high    
Version: 2.1CC: gluster-bugs, nsathyan, rhs-bugs, spandura, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 980548 Environment:
Last Closed: 2015-12-03 17:19:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 980548    
Bug Blocks:    

Description Pranith Kumar K 2013-07-17 11:19:51 UTC
+++ This bug was initially created as a clone of Bug #980548 +++

Description of problem:
If we run tests/bugs/bug-888174.t of test framework in a while loop it fails intermittently.

Results of failed run:

Test Summary Report
-------------------
./tests/bugs/bug-888174.t                       (Wstat: 0 Tests: 25 Failed: 4)
  Failed tests:  22-25

Version-Release number of selected component (if applicable):


How reproducible:
intermittent

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Anand Avati on 2013-07-02 13:08:38 EDT ---

REVIEW: http://review.gluster.org/5274 (cluster/afr: Make sure flush is unwound after post-op) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-07-02 20:30:35 EDT ---

REVIEW: http://review.gluster.org/5274 (cluster/afr: post-op should complete before starting flush) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-07-03 03:40:48 EDT ---

COMMIT: http://review.gluster.org/5274 committed in master by Vijay Bellur (vbellur) 
------
commit 29619b4ee78926160435da82f9db213161e040d4
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 3 05:23:46 2013 +0530

    cluster/afr: post-op should complete before starting flush
    
    Problem:
    At the moment afr-flush makes sure that a delayed post-op
    is woken up but it does not wait for it to complete the
    post-op before flush unwinds.
    These are the steps that are happening:
    1) flush fop comes on an fd which wakes up a delayed post-op
    and continues with the flush fop.
    2) post-op sends fsync on the wire.
    3) flush completes and unwinds to fuse.
    4) graph switch happens on the fuse mount disconnecting the
    old graph's client connections to bricks.
    5) xattrop after fsync fails with ENOTCONN because the
    connections from old graph are taken down now.
    
    Fix:
    Wait for post-op to complete before starting to flush.
    We could make flush act similar to fsync (i.e.) wind
    flush as is but wait for post-op to complete before unwinding
    flush, but it is better to send flush as the final fop. So
    wind of flush will start after post-op is complete. Had to
    change fsync to accommodate this change.
    
    Change-Id: I93aa642647751969511718b0e137afbd067b388a
    BUG: 980548
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/5274
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 2 spandura 2013-08-13 05:59:15 UTC
Found this issue on the build: 
=============================
glusterfs 3.4.0.18rhs built on Aug  7 2013 08:02:45

Steps to recreate the bug:
==========================
1. Create 1 x 2 replicate volume. 

2. Set write-behind "off" , eager-lock "on"

3. Start the volume

4. Create 2 fuse mount from a client. {RHEL 6.4}

5. Open fd for a file from each mount point :
exec 5>>testfile1 {fuse1} 
exec 5>>testfile2 {fuse2} 

6. from each of the mount point execute the following :{ execute 10 times from each mount point }

for i in `seq 1 10000`; do echo "$(($(date +%s%N)/1000000))" >&5 ; sleep 1 ; done & 

7. After some time set write-behind to "on"

Actual result:
===============
The change-logs are not cleared. File on both the bricks are in fool-fool state. 

Brick0:
=======
root@king [Aug-13-2013-11:27:31] >getfattr -d -e hex -m . /rhs/bricks/b0/testfile1
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b0/testfile1
trusted.afr.vol_rep-client-0=0x000000400000000000000000
trusted.afr.vol_rep-client-1=0x000000400000000000000000
trusted.gfid=0xf7f1d74d3e63447289a245a0659c5bd4

root@king [Aug-13-2013-11:27:33] >getfattr -d -e hex -m . /rhs/bricks/b0/testfile2
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b0/testfile2
trusted.afr.vol_rep-client-0=0x0000003c0000000000000000
trusted.afr.vol_rep-client-1=0x0000003c0000000000000000
trusted.gfid=0x2c687c491eb0416ca44c3586a1c579db

Brick1:
=======
root@hicks [Aug-13-2013-11:26:51] >getfattr -d -e hex -m . /rhs/bricks/b1/testfile1
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/testfile1
trusted.afr.vol_rep-client-0=0x000000400000000000000000
trusted.afr.vol_rep-client-1=0x000000400000000000000000
trusted.gfid=0xf7f1d74d3e63447289a245a0659c5bd4

root@hicks [Aug-13-2013-11:26:53] >getfattr -d -e hex -m . /rhs/bricks/b1/testfile2
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/testfile2
trusted.afr.vol_rep-client-0=0x0000003c0000000000000000
trusted.afr.vol_rep-client-1=0x0000003c0000000000000000
trusted.gfid=0x2c687c491eb0416ca44c3586a1c579db

root@king [Aug-13-2013-11:27:49] >gluster volume info vol_rep
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 7b421b27-1cdd-4b93-b486-e8d0c9968f4d
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: king:/rhs/bricks/b0
Brick2: hicks:/rhs/bricks/b1
Options Reconfigured:
cluster.eager-lock: on
performance.write-behind: on

Comment 3 Vivek Agarwal 2015-12-03 17:19:53 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.