Bug 1379511

Summary: Fix spurious failures in open-behind.t
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: testsAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.10.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-06 17:28:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pranith Kumar K 2016-09-27 02:39:20 UTC
Description of problem:
I found the following two problems:
    1) flush-behind is on by default, so just because write completes doesn't mean
       it will be on the disk, it could still be in write-behind's cache. This
       leads to failure where if you write from one mount and expect it to be there
       on the other mount, sometimes it won't be there.
    2) Sometimes the graph switch is not completing by the time we issue read which
       is leading to opens not being sent on brick leading to failures.

Found the following in the logs:
[2016-09-27 01:46:48.307358]:++++++++++ G_LOG:./tests/performance/open-behind.t: TEST: 61 hello-this-is-a-test-message1 cat /mnt/glusterfs/1/test-file1 ++++++++++
[2016-09-27 01:46:48.357606] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 97e73f7f-d0e3-450f-8aa1-b7e540e01f3e
[2016-09-27 01:46:48.357620] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-patchy-server: accepted client from dhcp35-190.lab.eng.blr.redhat.com-24020-2016/09/27-01:46:26:934993-patchy-client-1-2-0 (version: 3.10dev)
[2016-09-27 01:46:48.359955] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 97e73f7f-d0e3-450f-8aa1-b7e540e01f3e
[2016-09-27 01:46:48.359976] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-patchy-server: accepted client from dhcp35-190.lab.eng.blr.redhat.com-24060-2016/09/27-01:46:31:61113-patchy-client-1-2-0 (version: 3.10dev)

Based on the logs above, graph is still coming up by the time 'cat' command is sent.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Pranith Kumar K 2016-09-27 02:40:14 UTC
01:48:37 ok 15, LINENUM:58
01:48:37 not ok 16 Got "" instead of "hello-this-is-a-test-message1", LINENUM:59
01:48:37 FAILED COMMAND: hello-this-is-a-test-message1 cat /mnt/glusterfs/1/test-file1
01:48:37 ok 17, LINENUM:61
01:48:37 ok 18, LINENUM:64
01:48:37 Failed 1/18 subtests 

This is the test failure about flush-behind

Comment 2 Worker Ant 2016-09-27 02:42:29 UTC
REVIEW: http://review.gluster.org/15575 (tests: Fix races in open-behind.t) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Worker Ant 2016-09-27 11:44:31 UTC
COMMIT: http://review.gluster.org/15575 committed in master by Raghavendra G (rgowdapp) 
------
commit cd072b61841c19ec942871e3f06519d2a938814b
Author: Pranith Kumar K <pkarampu>
Date:   Tue Sep 27 07:51:48 2016 +0530

    tests: Fix races in open-behind.t
    
    Problems:
    1) flush-behind is on by default, so just because write completes doesn't mean
       it will be on the disk, it could still be in write-behind's cache. This
       leads to failure where if you write from one mount and expect it to be there
       on the other mount, sometimes it won't be there.
    2) Sometimes the graph switch is not completing by the time we issue read which
       is leading to opens not being sent on brick leading to failures.
    
    Fixes:
    1) Disable flush-behind
    2) Add new functions to check the new graph is there and connected to bricks
       before 'cat' is executed.
    
    BUG: 1379511
    Change-Id: I0faed684e0dc70cfd2258ce6fdaed655ee915ae6
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/15575
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 4 Shyamsundar 2017-03-06 17:28:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/