Description of problem:
Description of problem:

Testbed : 12*(4+2),6 servers,6 workload generating clients.

Benchmark : 3.1.3 with io-threads enabled.

3.2 testing was done with io-threads enabled and mdcache parameters set

There is a massive regression on mkdirs from 3.1.3 to 3.2 on EC over FUSE and gNFS :


3.1.3 : 674 files/sec
3.2   : 90 files/sec

Regression : 86%


3.1.3 : 319 files/sec
3.2   :  90 files/sec

Regression : 70%

Version-Release number of selected component (if applicable):


How reproducible:

Every which way i try.

Actual results:

Perf Regression of nearly 80% on mkdirs.

Expected results:

Regression Threshold is +-10%.

Additional info:
> Server Profile will be attached.

> Client and Server OS : RHEL 7.3

> *Vol config* :

Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 4a377ad9-0c87-4553-b45f-95ab0590c055
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.lookup-optimize: on
performance.client-io-threads: on
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
server.event-threads: 4
client.event-threads: 4
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
[root@gqas008 ~]#

Ashish Pandey 2016-12-27 12:08:29 UTC
RC for this bug is similar to  https://bugzilla.redhat.com/show_bug.cgi?id=1406723

master patch - 

Atin Mukherjee 2017-03-03 06:48:13 UTC
A new upstream patch https://review.gluster.org/#/c/16821/ is posted with a different alternative.

Ambarish 2017-03-07 11:29:05 UTC
3.1.3 : 674 files/sec

3.8.4-14 : 521 files/sec

Regression : ~23%

The regression has certainly mellowed down from 85% to 23%,but it is still not within my Regression Threshold.

Moving it back to Dev for a re-look.

Pranith Kumar K 2017-03-07 11:35:44 UTC
(In reply to Ambarish from comment #10)
> 3.1.3 : 674 files/sec
> 3.8.4-14 : 521 files/sec
> Regression : ~23%
> The regression has certainly mellowed down from 85% to 23%,but it is still
> not within my Regression Threshold.
> Moving it back to Dev for a re-look.

Number of network operations is exactly same as 3.1.3 but the avg latencies increased in 3.2.0 Based on the testing I did, it seems to be because of l[g/s]etxattr for trusted.ec.dirty. I just provided a build to Ambarish to confirm if that brings the number closer.

Ambarish 2017-03-07 12:59:59 UTC
The fix seems to work :)

3.1.3 : 674 files/sec

Pranith's bench build : 696 files/sec

Atin Mukherjee 2017-03-07 13:04:11 UTC
upstream patch : https://review.gluster.org/16865

Pranith Kumar K 2017-03-07 14:11:54 UTC

Milind Changire 2017-03-08 12:20:45 UTC
bug added to erratum RHSA-2016:24866-04
moving bz to ON_QA

Ambarish 2017-03-09 15:41:30 UTC
Tested on glusterfs-3.8.4-18 : 

3.1.3 : 674 files/sec

3.8.4-18 : 682 files/sec

Happily moving this to Verified.

errata-xmlrpc 2017-03-23 06:00:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


