Bug 1408655

Summary: [Perf] : mkdirs are 85% slower on EC
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED ERRATA QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, mchangir, pkarampu, rcyriac, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-18 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 06:00:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528    

Description Ambarish 2016-12-26 08:54:18 UTC
Description of problem:
-----------------------

Testbed : 12*(4+2),6 servers,6 workload generating clients.

Benchmark : 3.1.3 with io-threads enabled.

3.2 testing was done with io-threads enabled and mdcache parameters set

There is a massive regression on mkdirs from 3.1.3 to 3.2 on EC over FUSE and gNFS :

****
FUSE
****

3.1.3 : 674 files/sec
3.2   : 90 files/sec

Regression : 86%

*****
gNFS
*****

3.1.3 : 319 files/sec
3.2   :  90 files/sec

Regression : 70%


Version-Release number of selected component (if applicable):
------------------------------------------------------------

glusterfs-3.8.4-10.el7rhgs.x86_64

How reproducible:
------------------

Every which way i try.


Actual results:
----------------

Perf Regression of nearly 80% on mkdirs.

Expected results:
-----------------

Regression Threshold is +-10%.

Additional info:
----------------
> Server Profile will be attached.

> Client and Server OS : RHEL 7.3

> *Vol config* :

Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 4a377ad9-0c87-4553-b45f-95ab0590c055
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick2: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick3: gqas010.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick4: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick5: gqas001.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick6: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick7: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick8: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick9: gqas010.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick10: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick11: gqas001.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick12: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick13: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick14: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick15: gqas010.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick16: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick17: gqas001.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick18: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick19: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick20: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick21: gqas010.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick22: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick23: gqas001.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick24: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick25: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick26: gqas009.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick27: gqas010.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick28: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick29: gqas001.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick30: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick31: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick32: gqas009.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick33: gqas010.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick34: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick35: gqas001.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick36: gqas016.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick37: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick38: gqas009.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick39: gqas010.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick40: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick41: gqas001.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick42: gqas016.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick43: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick44: gqas009.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick45: gqas010.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick46: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick47: gqas001.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick48: gqas016.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick49: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick50: gqas009.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick51: gqas010.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick52: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick53: gqas001.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick54: gqas016.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick55: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick56: gqas009.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick57: gqas010.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick58: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick59: gqas001.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick60: gqas016.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick61: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick62: gqas009.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick63: gqas010.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick64: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick65: gqas001.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick66: gqas016.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick67: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick68: gqas009.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick69: gqas010.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick70: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick71: gqas001.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick72: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.lookup-optimize: on
performance.client-io-threads: on
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
[root@gqas008 ~]#

Comment 6 Ashish Pandey 2016-12-27 12:08:29 UTC
RC for this bug is similar to  https://bugzilla.redhat.com/show_bug.cgi?id=1406723

master patch - 
http://review.gluster.org/#/c/16298/

Comment 7 Atin Mukherjee 2017-03-03 06:48:13 UTC
A new upstream patch https://review.gluster.org/#/c/16821/ is posted with a different alternative.

Comment 10 Ambarish 2017-03-07 11:29:05 UTC
3.1.3 : 674 files/sec

3.8.4-14 : 521 files/sec


Regression : ~23%


The regression has certainly mellowed down from 85% to 23%,but it is still not within my Regression Threshold.


Moving it back to Dev for a re-look.

Comment 11 Pranith Kumar K 2017-03-07 11:35:44 UTC
(In reply to Ambarish from comment #10)
> 3.1.3 : 674 files/sec
> 
> 3.8.4-14 : 521 files/sec
> 
> 
> Regression : ~23%
> 
> 
> The regression has certainly mellowed down from 85% to 23%,but it is still
> not within my Regression Threshold.
> 
> 
> Moving it back to Dev for a re-look.

Number of network operations is exactly same as 3.1.3 but the avg latencies increased in 3.2.0 Based on the testing I did, it seems to be because of l[g/s]etxattr for trusted.ec.dirty. I just provided a build to Ambarish to confirm if that brings the number closer.

Comment 12 Ambarish 2017-03-07 12:59:59 UTC
The fix seems to work :)


3.1.3 : 674 files/sec

Pranith's bench build : 696 files/sec

Comment 13 Atin Mukherjee 2017-03-07 13:04:11 UTC
upstream patch : https://review.gluster.org/16865

Comment 14 Pranith Kumar K 2017-03-07 14:11:54 UTC
https://code.engineering.redhat.com/gerrit/99547

Comment 15 Milind Changire 2017-03-08 12:20:45 UTC
bug added to erratum RHSA-2016:24866-04
moving bz to ON_QA

Comment 16 Ambarish 2017-03-09 15:41:30 UTC
Tested on glusterfs-3.8.4-18 : 

3.1.3 : 674 files/sec

3.8.4-18 : 682 files/sec


Happily moving this to Verified.

Comment 18 errata-xmlrpc 2017-03-23 06:00:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html