|Summary:||[RFE] DHT performance improvements for directory operations|
|Product:||Red Hat Gluster Storage||Reporter:||Nithya Balachandran <nbalacha>|
|Component:||distribute||Assignee:||Raghavendra G <rgowdapp>|
|Status:||CLOSED ERRATA||QA Contact:||Sachin P Mali <smali>|
|Version:||rhgs-3.1||CC:||amukherj, jahernan, ksandha, msaini, rcyriac, rgowdapp, rhinduja, rhs-bugs, sheggodu, storage-qa-internal|
|Target Milestone:||---||Keywords:||FutureFeature, ZStream|
|Target Release:||RHGS 3.4.0|
|Fixed In Version:||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2018-09-04 06:29:40 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1558995|
|Bug Blocks:||1118770, 1301474, 1336766, 1345828, 1503132|
Description Nithya Balachandran 2016-08-23 06:15:14 UTC
Description of problem: The fixes for directory consistency use locks to prevent concurrent ops from tromping on each other. Taking these locks causes the performance to degrade. This BZ has been opened to track potential improvements to this approach. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Comment 4 Raghavendra G 2017-08-23 10:16:35 UTC
Patch  has been merged in upstream v3.12. This patch aims to bring down performance penalty due to locking and also fixes other consistency issues. This fix is not present in rhgs-3.3.0 and can be targeted for rhgs-3.4.0  https://review.gluster.org/15472
Comment 5 Raghavendra G 2017-08-23 11:10:04 UTC
(In reply to Raghavendra G from comment #4) > Patch  has been merged in upstream v3.12. This patch aims to bring down > performance penalty due to locking and also fixes other consistency issues. > This fix is not present in rhgs-3.3.0 and can be targeted for rhgs-3.4.0 > >  https://review.gluster.org/15472 More detailed breakdown of impact of this patch on different fops: * mkdir Note that the locking penalty in dht_mkdir codepath was always constant irrespective of scale before . In fact  increases the number of serialized locks acquired, by 1 (though penalty is constant and independent of scale). So, not much of improvement or regression can be expected in this codepath. However, if there are parallel access (lookups) to directory,  is expected to improve the performance significantly as directory creation phase acquired locks serially on all subvolumes (see directory creation by selfheal during lookup). * rmdir Note that performance penalty due to locking increased linearly with scale before  as we used to acquire lock on all subvolumes of dht. With  the penalty is constant and is independent of scale. So,  is expected to improve performance of rmdir significantly, especially when number of subvolumes to dht is relatively large * renamedir Note that performance penalty due to locking increased linearly with scale before  as we used to acquire lock on all subvolumes of dht. With  the penalty is constant and is independent of scale. So,  is expected to improve performance of renamedir significantly, especially when number of subvolumes to dht is relatively large * directory creation by selfheal during lookup Before , directory creation during selfheal would acquire lock on all subvols. Also, this is the same lock acquired by mkdir codepath while setting layout. So, a parallel heal on a directory being created can add this locking latency to mkdir.  makes this locking penalty constant irrespective of scale. * layout healing by selfheal during lookup No change in locking algorithm is introduced by  More details of the algorithm itself can be found at .  https://github.com/gluster/glusterfs/blob/master/doc/developer-guide/dirops-transactions-in-dht.md
Comment 6 Raghavendra G 2017-09-04 06:50:22 UTC
Performance for the following operations should improve when compared with rhgs-3.3.0: * rmdir * renamedir * mkdir, when the directory is accessed in parallel to directory creation. * directory healing (for the cases where few subvolumes were down during directory creation and the directory is accessed later after those subvolumes are up). No performance improvement is expected for, * standalone mkdir (no parallel access during directory creation)
Comment 7 Raghavendra G 2017-09-04 07:08:00 UTC
(In reply to Raghavendra G from comment #6) > Performance for the following operations should improve when compared with > rhgs-3.3.0: > * rmdir Note that bz 1330235 which is CLOSED WONTFIX will be fixed as part of current bz.
Comment 10 Ambarish 2018-03-26 11:51:07 UTC
Karan has found two massive perf regressions on the latest interim build on mkdirs and rmdirs : https://bugzilla.redhat.com/show_bug.cgi?id=1558995 - 30% regression on small-file rmdirs from 3.3.1 https://bugzilla.redhat.com/show_bug.cgi?id=1558994 - 47% regression in mkdir from 3.3.1 Note to Self and other QEs : Verification of this RFE would involve : A) The above perf regressions to be fixed. B) Substantial perf improvement from baseline(any RHGS build without these fixes) Since on glusterfs-3.12.2-5 , I find mkdirs and rmdirs to be VERY slow on the basic use case (Dist Rep + FUSE ) , and this particular RFE tracks perf improvements on directory operations , I cannot move this bug to Verified.
Comment 18 Raghavendra G 2018-07-23 11:14:55 UTC
From, https://bugzilla.redhat.com/show_bug.cgi?id=1598424#c23 331 total time | 340 (bmux off) total time ======================================================= entrylk-total-time 10750 |entrylk-total-time 16746.1 getxattr-total-time 93.4708 | opendir-total-time 3388.05 |opendir-total-time 4024.19 readdirp-total-time 3841.3 |readdirp-total-time 4394.42 inodelk-total-time 6131.27 |inodelk-total-time 341.158 finodelk-total-time 0.011168| rmdir-total-time 7849.47 |rmdir-total-time 9430.73 lookup-total-time 35545.3 |lookup-total-time 40633.7 331 total calls | 340 (bmux off) total calls ========================================================= entrylk-total-calls 69120612|entrylk-total-calls 70560624 getxattr-total-calls 383820 | opendir-total-calls 17280936|opendir-total-calls 17280936 readdirp-total-calls 5760317|readdirp-total-calls 5760213 inodelk-total-calls 34560300|inodelk-total-calls 1440162 finodelk-total-calls 48 | rmdir-total-calls 17280072 |rmdir-total-calls 17280072 lookup-total-calls 18604130 |lookup-total-calls 21468343 331 total times | 340 (bmux on) total times ======================================================== entrylk-total-time 10750 |entrylk-total-time 10090.8 getxattr-total-time 93.4708 |getxattr-total-time 0.117269 opendir-total-time 3388.05 |opendir-total-time 3252.09 readdirp-total-time 3841.3 |readdirp-total-time 3436.54 inodelk-total-time 6131.27 |inodelk-total-time 202.09 finodelk-total-time 0.011168|finodelk-total-time 0.003563 rmdir-total-time 7849.47 |rmdir-total-time 6923.74 lookup-total-time 35545.3 |lookup-total-time 43867.4 331 total calls | 340 (bmux on) total calls ========================================================== entrylk-total-calls 69120612|entrylk-total-calls 70560624 getxattr-total-calls 383820 |getxattr-total-calls 864 opendir-total-calls 17280936|opendir-total-calls 17281152 readdirp-total-calls 5760317|readdirp-total-calls 5760213 inodelk-total-calls 34560300|inodelk-total-calls 1440162 finodelk-total-calls 48 |finodelk-total-calls 48 rmdir-total-calls 17280072 |rmdir-total-calls 17280072 lookup-total-calls 18604130 |lookup-total-calls 18201803 Observe the number of inodelks bmux on: inodelk-total-calls 34560300|inodelk-total-calls 1440162 bmux off inodelk-total-calls 34560300|inodelk-total-calls 1440162 and total time: bmux on: inodelk-total-time 6131.27 |inodelk-total-time 202.09 bmux off: inodelk-total-time 6131.27 |inodelk-total-time 341.158 So, from perspective of DHT, there is an improvement. Also, its observed that with bmux off there is an improvement in number of rmdirs in 3.4.0 wrt 3.3.1 The gains with this RFE are offset with losses from bmux. As there are already different bugs bz 1598424 and bz 1598056 to track regressions wrt to bmux, I propose to move this bug to ON_QA. Also, I need data for renamedir as rmdir and renamedir are two operations that are benefited by this improvement as already noted in comment #5. @Karan, can you update the bug with perf numbers for renamedir?
Comment 19 Raghavendra G 2018-07-23 11:17:40 UTC
NOTE: The scope of this RFE is improvements in DHT.
Comment 25 errata-xmlrpc 2018-09-04 06:29:40 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607