Bug 1475136

Summary: [Perf] : Large file sequential reads are off target by ~38% on FUSE/Ganesha
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, asoman, bturner, dang, jthottan, kkeithle, ksandha, mbenjamin, nbalacha, rcyriac, rhinduja, rhs-bugs, sanandpa, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-39 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1476665 (view as bug list) Environment:
Last Closed: 2017-09-21 05:04:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1476665    
Bug Blocks: 1417151, 1475176, 1479303    

Description Ambarish 2017-07-26 07:14:36 UTC
Description of problem:
-----------------------

A regression seems to have been introduced in recent bits on large file seq reads :

3.3 : 2480044.05 kB/sec	
3.8.4-35 : 1538178.2 kB/sec


Regression : ~38%

This is on a vanilla volume,without PR,NL,mdcache.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

3.8.4-35

How reproducible:
-----------------

100%

Additional info:
----------------
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 4b52bfb8-28fd-4e0f-8ee0-eb8116a296c4
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on

Comment 5 Ambarish 2017-07-26 07:22:54 UTC
(In reply to Ambarish from comment #0)
> Description of problem:
> -----------------------
> 
> A regression seems to have been introduced in recent bits on large file seq
> reads :
> 
> 3.3 : 2480044.05 kB/sec	
> 3.8.4-35 : 1538178.2 kB/sec
> 
> 
> Regression : ~38%
> 
> This is on a vanilla volume,without PR,NL,mdcache.
> 
> Version-Release number of selected component (if applicable):
> -------------------------------------------------------------
> 
> 3.8.4-35
> 
> How reproducible:
> -----------------
> 
> 100%
> 
> Additional info:
> ----------------
> Volume Name: testvol
> Type: Distributed-Replicate
> Volume ID: 4b52bfb8-28fd-4e0f-8ee0-eb8116a296c4
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
> Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
> Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
> Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
> Options Reconfigured:
> server.allow-insecure: on
> performance.stat-prefetch: off
> transport.address-family: inet
> nfs.disable: on

Ugh!

Typo.

I meant :

 3.2 : 2480044.05 kB/sec	
 3.8.4-35 : 1538178.2 kB/sec
 
 
 Regression : ~38%

Comment 20 Atin Mukherjee 2017-07-31 13:03:28 UTC
upstream patch : https://review.gluster.org/#/c/17922/

Comment 22 Atin Mukherjee 2017-08-07 06:40:57 UTC
Patch mentioned in comment 20 is not valid any more. New upstream patch : https://review.gluster.org/#/c/17976/

Comment 28 Karan Sandha 2017-08-11 08:32:36 UTC
Tested on 3.8.4-39 with Sequential Reads on FUSE Mount and regressions were not seen. Ran 2 different iterations to verify the fix. 
Waiting for ganesha results to mark it as verified. Over to you ambarish.

Comment 29 Ambarish 2017-08-13 06:45:19 UTC
Consistently close numbers to my baseline on EC/FUSE and Dist-Rep/Ganesha : 

*EC/FUSE*:

Baseline : 2644530 kB/sec
3.8.4-40 : 2477115.03 kB/sec

Regression : -6%

*Dist/Rep/Ganesha*:

Baseline(3.2) : 1430299 kB/sec
3.8.4-40 : 1426718 kB/sec

Regression : Not much


These are allowable limits that my tests are allowed to have in between releases.

Based on https://bugzilla.redhat.com/show_bug.cgi?id=1475136#c28,and this one,I am moving the bug to Verified.

Comment 31 errata-xmlrpc 2017-09-21 05:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774