Bug 1408655 - [Perf] : mkdirs are 85% slower on EC
Summary: [Perf] : mkdirs are 85% slower on EC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Ashish Pandey
QA Contact: Ambarish
URL:
Whiteboard:
Depends On:
Blocks: 1351528
TreeView+ depends on / blocked
 
Reported: 2016-12-26 08:54 UTC by Ambarish
Modified: 2017-03-28 06:51 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.4-18
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 06:00:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1406723 0 unspecified CLOSED [Perf] : significant Performance regression seen with disperse volume when compared with 3.1.3 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Internal Links: 1406723

Description Ambarish 2016-12-26 08:54:18 UTC
Description of problem:
-----------------------

Testbed : 12*(4+2),6 servers,6 workload generating clients.

Benchmark : 3.1.3 with io-threads enabled.

3.2 testing was done with io-threads enabled and mdcache parameters set

There is a massive regression on mkdirs from 3.1.3 to 3.2 on EC over FUSE and gNFS :

****
FUSE
****

3.1.3 : 674 files/sec
3.2   : 90 files/sec

Regression : 86%

*****
gNFS
*****

3.1.3 : 319 files/sec
3.2   :  90 files/sec

Regression : 70%


Version-Release number of selected component (if applicable):
------------------------------------------------------------

glusterfs-3.8.4-10.el7rhgs.x86_64

How reproducible:
------------------

Every which way i try.


Actual results:
----------------

Perf Regression of nearly 80% on mkdirs.

Expected results:
-----------------

Regression Threshold is +-10%.

Additional info:
----------------
> Server Profile will be attached.

> Client and Server OS : RHEL 7.3

> *Vol config* :

Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 4a377ad9-0c87-4553-b45f-95ab0590c055
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick2: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick3: gqas010.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick4: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick5: gqas001.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick6: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick7: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick8: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick9: gqas010.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick10: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick11: gqas001.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick12: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick13: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick14: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick15: gqas010.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick16: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick17: gqas001.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick18: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick19: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick20: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick21: gqas010.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick22: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick23: gqas001.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick24: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick25: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick26: gqas009.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick27: gqas010.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick28: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick29: gqas001.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick30: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick31: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick32: gqas009.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick33: gqas010.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick34: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick35: gqas001.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick36: gqas016.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick37: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick38: gqas009.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick39: gqas010.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick40: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick41: gqas001.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick42: gqas016.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick43: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick44: gqas009.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick45: gqas010.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick46: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick47: gqas001.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick48: gqas016.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick49: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick50: gqas009.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick51: gqas010.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick52: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick53: gqas001.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick54: gqas016.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick55: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick56: gqas009.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick57: gqas010.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick58: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick59: gqas001.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick60: gqas016.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick61: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick62: gqas009.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick63: gqas010.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick64: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick65: gqas001.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick66: gqas016.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick67: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick68: gqas009.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick69: gqas010.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick70: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick71: gqas001.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick72: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.lookup-optimize: on
performance.client-io-threads: on
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
[root@gqas008 ~]#

Comment 6 Ashish Pandey 2016-12-27 12:08:29 UTC
RC for this bug is similar to  https://bugzilla.redhat.com/show_bug.cgi?id=1406723

master patch - 
http://review.gluster.org/#/c/16298/

Comment 7 Atin Mukherjee 2017-03-03 06:48:13 UTC
A new upstream patch https://review.gluster.org/#/c/16821/ is posted with a different alternative.

Comment 10 Ambarish 2017-03-07 11:29:05 UTC
3.1.3 : 674 files/sec

3.8.4-14 : 521 files/sec


Regression : ~23%


The regression has certainly mellowed down from 85% to 23%,but it is still not within my Regression Threshold.


Moving it back to Dev for a re-look.

Comment 11 Pranith Kumar K 2017-03-07 11:35:44 UTC
(In reply to Ambarish from comment #10)
> 3.1.3 : 674 files/sec
> 
> 3.8.4-14 : 521 files/sec
> 
> 
> Regression : ~23%
> 
> 
> The regression has certainly mellowed down from 85% to 23%,but it is still
> not within my Regression Threshold.
> 
> 
> Moving it back to Dev for a re-look.

Number of network operations is exactly same as 3.1.3 but the avg latencies increased in 3.2.0 Based on the testing I did, it seems to be because of l[g/s]etxattr for trusted.ec.dirty. I just provided a build to Ambarish to confirm if that brings the number closer.

Comment 12 Ambarish 2017-03-07 12:59:59 UTC
The fix seems to work :)


3.1.3 : 674 files/sec

Pranith's bench build : 696 files/sec

Comment 13 Atin Mukherjee 2017-03-07 13:04:11 UTC
upstream patch : https://review.gluster.org/16865

Comment 14 Pranith Kumar K 2017-03-07 14:11:54 UTC
https://code.engineering.redhat.com/gerrit/99547

Comment 15 Milind Changire 2017-03-08 12:20:45 UTC
bug added to erratum RHSA-2016:24866-04
moving bz to ON_QA

Comment 16 Ambarish 2017-03-09 15:41:30 UTC
Tested on glusterfs-3.8.4-18 : 

3.1.3 : 674 files/sec

3.8.4-18 : 682 files/sec


Happily moving this to Verified.

Comment 18 errata-xmlrpc 2017-03-23 06:00:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.