Description of problem: ----------------------- Testbed : 12*(4+2),6 servers,6 workload generating clients. Benchmark : 3.1.3 with io-threads enabled. 3.2 testing was done with io-threads enabled and mdcache parameters set There is a massive regression on mkdirs from 3.1.3 to 3.2 on EC over FUSE and gNFS : **** FUSE **** 3.1.3 : 674 files/sec 3.2 : 90 files/sec Regression : 86% ***** gNFS ***** 3.1.3 : 319 files/sec 3.2 : 90 files/sec Regression : 70% Version-Release number of selected component (if applicable): ------------------------------------------------------------ glusterfs-3.8.4-10.el7rhgs.x86_64 How reproducible: ------------------ Every which way i try. Actual results: ---------------- Perf Regression of nearly 80% on mkdirs. Expected results: ----------------- Regression Threshold is +-10%. Additional info: ---------------- > Server Profile will be attached. > Client and Server OS : RHEL 7.3 > *Vol config* : Volume Name: butcher Type: Distributed-Disperse Volume ID: 4a377ad9-0c87-4553-b45f-95ab0590c055 Status: Started Snapshot Count: 0 Number of Bricks: 12 x (4 + 2) = 72 Transport-type: tcp Bricks: Brick1: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick2: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick3: gqas010.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick4: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick5: gqas001.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick6: gqas016.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick7: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick8: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick9: gqas010.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick10: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick11: gqas001.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick12: gqas016.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick13: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick14: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick15: gqas010.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick16: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick17: gqas001.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick18: gqas016.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick19: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick20: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick21: gqas010.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick22: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick23: gqas001.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick24: gqas016.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick25: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick26: gqas009.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick27: gqas010.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick28: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick29: gqas001.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick30: gqas016.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick31: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick32: gqas009.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick33: gqas010.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick34: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick35: gqas001.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick36: gqas016.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick37: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick38: gqas009.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick39: gqas010.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick40: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick41: gqas001.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick42: gqas016.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick43: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick44: gqas009.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick45: gqas010.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick46: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick47: gqas001.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick48: gqas016.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick49: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick50: gqas009.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick51: gqas010.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick52: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick53: gqas001.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick54: gqas016.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick55: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick56: gqas009.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick57: gqas010.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick58: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick59: gqas001.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick60: gqas016.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick61: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick62: gqas009.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick63: gqas010.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick64: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick65: gqas001.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick66: gqas016.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick67: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick68: gqas009.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick69: gqas010.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick70: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick71: gqas001.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick72: gqas016.sbu.lab.eng.bos.redhat.com:/bricks12/brick Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on cluster.lookup-optimize: on performance.client-io-threads: on nfs.disable: off performance.readdir-ahead: on transport.address-family: inet server.event-threads: 4 client.event-threads: 4 features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-samba-metadata: on performance.cache-invalidation: on performance.md-cache-timeout: 600 [root@gqas008 ~]#
RC for this bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1406723 master patch - http://review.gluster.org/#/c/16298/
A new upstream patch https://review.gluster.org/#/c/16821/ is posted with a different alternative.
3.1.3 : 674 files/sec 3.8.4-14 : 521 files/sec Regression : ~23% The regression has certainly mellowed down from 85% to 23%,but it is still not within my Regression Threshold. Moving it back to Dev for a re-look.
(In reply to Ambarish from comment #10) > 3.1.3 : 674 files/sec > > 3.8.4-14 : 521 files/sec > > > Regression : ~23% > > > The regression has certainly mellowed down from 85% to 23%,but it is still > not within my Regression Threshold. > > > Moving it back to Dev for a re-look. Number of network operations is exactly same as 3.1.3 but the avg latencies increased in 3.2.0 Based on the testing I did, it seems to be because of l[g/s]etxattr for trusted.ec.dirty. I just provided a build to Ambarish to confirm if that brings the number closer.
The fix seems to work :) 3.1.3 : 674 files/sec Pranith's bench build : 696 files/sec
upstream patch : https://review.gluster.org/16865
https://code.engineering.redhat.com/gerrit/99547
bug added to erratum RHSA-2016:24866-04 moving bz to ON_QA
Tested on glusterfs-3.8.4-18 : 3.1.3 : 674 files/sec 3.8.4-18 : 682 files/sec Happily moving this to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html