Bug 1324612

Summary: [Perf] : Large File Random Write performance is off target by 14% on gNFS mounts
Product: Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED WONTFIX QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: amukherj, asoman, pkarampu, pprakash, rcyriac, rhinduja, rhs-bugs, sasundar
Target Milestone: ---Keywords: Regression, ZStream
Target Release: ---Flags: asoman: needinfo+
asoman: needinfo+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-14 09:11:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Iozone console logs none

Description Ambarish 2016-04-06 19:13:51 UTC
Created attachment 1144328 [details]
Iozone console logs

Description of problem:

I see a slight regression on random write throughputs on gNFS mounts.

This is from one of the automated runs :

With 3.1.2 (baseline):  mean rand write throughput = 273697.822500 KB/s

With 3.1.3 : mean rand write throughput= 233082.450000 KB/s

Regression : ~ 14.8%

Version-Release number of selected component (if applicable):

glusterfs-3.7.5-19.el6rhs.x86_64

How reproducible:

2/2

Steps to Reproduce:

1. Run iozone random R/W test(I=2) on nfs mounts with 3.1.2 thrice
2. Run same test thrice after upgrading to RHGS 3.1.3
3. The mean throughputs should not vary by more than 10%


Actual results:

Random Writes regressed by >10%


Expected results:

Regression Threshold is 10%


Additional info:


OS : RHEL 6.7

Iozone was used in a distributed multithreaded manner with a 2G file size ,record size of 64K and a total of 16 threads.


Setup consisted of 4 servers,4 clients (1X mount per server) on 10GbE network.

Volume Settings :

[root@gqas001 ~]# gluster v info

 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 2a668beb-7f26-48f9-8550-157108fe1a55
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas016.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
performance.readdir-ahead: on
performance.stat-prefetch: off
server.allow-insecure: on
[root@gqas001 ~]# 
[root@gqas001 ~]# 

Console logs attached.

Comment 10 Ambarish 2016-04-07 13:31:34 UTC
Ugggh!
I meant Version number =  glusterfs-3.7.9-1.el6rhs.x86_64

Comment 15 Ambarish 2016-04-28 11:21:13 UTC
I tried thrice but wasn't able to reproduce it on 3.7.9-2 build.
I'll test it once 3.7.9-3 is out and if I see reproduce numbers consistent with my baseline,I'll lower the prio/close the bug.

Comment 16 Pranith Kumar K 2016-05-05 03:32:22 UTC
(In reply to Ambarish from comment #15)
> I tried thrice but wasn't able to reproduce it on 3.7.9-2 build.
> I'll test it once 3.7.9-3 is out and if I see reproduce numbers consistent
> with my baseline,I'll lower the prio/close the bug.

Please close the bug if we are not able to re-create it, as there isn't much dev can debug if we lower the prio. The bug will linger on. You can always re-open the bug if it happens again.