Bug 1475136

Summary:	[Perf] : Large file sequential reads are off target by ~38% on FUSE/Ganesha
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ambarish <asoman>
Component:	distribute	Assignee:	Nithya Balachandran <nbalacha>
Status:	CLOSED ERRATA	QA Contact:	Ambarish <asoman>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, asoman, bturner, dang, jthottan, kkeithle, ksandha, mbenjamin, nbalacha, rcyriac, rhinduja, rhs-bugs, sanandpa, skoduri, storage-qa-internal
Target Milestone:	---	Keywords:	Regression
Target Release:	RHGS 3.3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-39	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1476665 (view as bug list)		Environment:
Last Closed:	2017-09-21 05:04:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1476665
Bug Blocks:	1417151, 1475176, 1479303

Description Ambarish 2017-07-26 07:14:36 UTC

Description of problem:
-----------------------

A regression seems to have been introduced in recent bits on large file seq reads :

3.3 : 2480044.05 kB/sec	
3.8.4-35 : 1538178.2 kB/sec


Regression : ~38%

This is on a vanilla volume,without PR,NL,mdcache.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

3.8.4-35

How reproducible:
-----------------

100%

Additional info:
----------------
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 4b52bfb8-28fd-4e0f-8ee0-eb8116a296c4
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on

Comment 5 Ambarish 2017-07-26 07:22:54 UTC

(In reply to Ambarish from comment #0)
> Description of problem:
> -----------------------
> 
> A regression seems to have been introduced in recent bits on large file seq
> reads :
> 
> 3.3 : 2480044.05 kB/sec	
> 3.8.4-35 : 1538178.2 kB/sec
> 
> 
> Regression : ~38%
> 
> This is on a vanilla volume,without PR,NL,mdcache.
> 
> Version-Release number of selected component (if applicable):
> -------------------------------------------------------------
> 
> 3.8.4-35
> 
> How reproducible:
> -----------------
> 
> 100%
> 
> Additional info:
> ----------------
> Volume Name: testvol
> Type: Distributed-Replicate
> Volume ID: 4b52bfb8-28fd-4e0f-8ee0-eb8116a296c4
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
> Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
> Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
> Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
> Options Reconfigured:
> server.allow-insecure: on
> performance.stat-prefetch: off
> transport.address-family: inet
> nfs.disable: on

Ugh!

Typo.

I meant :

 3.2 : 2480044.05 kB/sec	
 3.8.4-35 : 1538178.2 kB/sec
 
 
 Regression : ~38%

Comment 20 Atin Mukherjee 2017-07-31 13:03:28 UTC

upstream patch : https://review.gluster.org/#/c/17922/

Comment 22 Atin Mukherjee 2017-08-07 06:40:57 UTC

Patch mentioned in comment 20 is not valid any more. New upstream patch : https://review.gluster.org/#/c/17976/

Comment 28 Karan Sandha 2017-08-11 08:32:36 UTC

Tested on 3.8.4-39 with Sequential Reads on FUSE Mount and regressions were not seen. Ran 2 different iterations to verify the fix. 
Waiting for ganesha results to mark it as verified. Over to you ambarish.

Comment 29 Ambarish 2017-08-13 06:45:19 UTC

Consistently close numbers to my baseline on EC/FUSE and Dist-Rep/Ganesha : 

*EC/FUSE*:

Baseline : 2644530 kB/sec
3.8.4-40 : 2477115.03 kB/sec

Regression : -6%

*Dist/Rep/Ganesha*:

Baseline(3.2) : 1430299 kB/sec
3.8.4-40 : 1426718 kB/sec

Regression : Not much


These are allowable limits that my tests are allowed to have in between releases.

Based on https://bugzilla.redhat.com/show_bug.cgi?id=1475136#c28,and this one,I am moving the bug to Verified.

Comment 31 errata-xmlrpc 2017-09-21 05:04:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774