Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1325857

Summary:	Multi-threaded SHD support
Product:	[Community] GlusterFS	Reporter:	Pranith Kumar K <pkarampu>
Component:	replicate	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	3.7.10	CC:	atalur, bugs, ndevos, paulds, pkarampu, ravishankar, rkavunga, rwareing
Target Milestone:	---	Keywords:	FutureFeature, Patch, Triaged
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:	glusterfs-3.7.12	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:	1221737	Environment:
Last Closed:	2016-06-28 12:14:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1221737
Bug Blocks:	1314724

Description Pranith Kumar K 2016-04-11 11:18:41 UTC

+++ This bug was initially created as a clone of Bug #1221737 +++

Multi-threading support is critical for two important use-cases:

Halo replication (separate patch) - Long distance replication are high
latency and parallel healing is required for performance. Use higher
(16-32 threads for such use-cases).

Traditional clusters where bricks are being healed from scratch w/ large
numbers of small files (4-8 threads should be sufficient for these
use-cases).

The net result is anywhere from 2-30x SHD performance depending on how
many threads you use and what kind of storage hardware you have backing
your bricks. For bricks with large numbers of small files, the effect is
especially dramatic.

NOTES: It's critical to ensure your bricks have a sufficient number of
threads available via the performance.io-thread-count volume options.
Based on my tests sizing this to 2x the number of SHD threads is a good
place to start. Failure to do so can DOS your bricks with SHD requests.

--- Additional comment from  on 2015-05-14 13:21:58 EDT ---



--- Additional comment from Mohammed Rafi KC on 2015-05-19 09:14:38 EDT ---

Thank you Richard!

--- Additional comment from Anand Avati on 2015-05-20 11:02:55 EDT ---

REVIEW: http://review.gluster.org/10851 (Multi-threaded SHD support) posted (#1) for review on master by Kaushal M (kaushal)

--- Additional comment from Anand Avati on 2015-08-28 03:07:46 EDT ---

REVIEW: http://review.gluster.org/10851 (Multi-threaded SHD support) posted (#2) for review on master by Kaushal M (kaushal)

--- Additional comment from Vijay Bellur on 2016-03-17 01:32:39 EDT ---

REVIEW: http://review.gluster.org/13569 (syncop: Add parallel dir scan functionality) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-17 01:32:44 EDT ---

REVIEW: http://review.gluster.org/13755 (cluster/afr: Use parallel dir scan functionality) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-17 02:28:15 EDT ---

REVIEW: http://review.gluster.org/13755 (cluster/afr: Use parallel dir scan functionality) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-03-17 02:28:19 EDT ---

REVIEW: http://review.gluster.org/13569 (syncop: Add parallel dir scan functionality) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-04-04 01:11:14 EDT ---

REVIEW: http://review.gluster.org/13755 (cluster/afr: Use parallel dir scan functionality) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-04-04 01:11:18 EDT ---

REVIEW: http://review.gluster.org/13569 (syncop: Add parallel dir scan functionality) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-04-04 20:36:44 EDT ---

COMMIT: http://review.gluster.org/13569 committed in master by Jeff Darcy (jdarcy) 
------
commit c76a1690bbd909b1c2dd2c495e2a8352d599b14b
Author: Pranith Kumar K <pkarampu>
Date:   Thu Mar 17 09:32:02 2016 +0530

    syncop: Add parallel dir scan functionality
    
    Most of this functionality's ideas are contributed
    by Richard Wareing, in his patch:
    https://bugzilla.redhat.com/show_bug.cgi?id=1221737#c1
    
    VERY BIG thanks to him :-).
    
    After starting porting/testing the patch above, I found a few things we can
    improve in this patch based on the results we got in testing.
    1) We are reading all the indices before we launch self-heals. In some customer
    cases I worked on there were almost 5million files/directories that needed
    heal. With such a big number self-heal daemon will be OOM killed if we go
    this route. So I modified this to launch heals based on a queue length
    limit.
    
    2) We found that for directory hierarchies, multi-threaded self-heal
    patch was not giving better results compared to single-threaded
    self-heal because of the order problems. We improved index xlator to
    give gfid type to make sure that all directories in the indices are
    healed before the files that follow in that iteration of readdir
    output(http://review.gluster.org/13553). In our testing this lead to
    zero errors of self-heals as we were only doing self-heals in parallel
    for files and not directories. I think we can further improve self-heal
    speed for directories by doing name heals in parallel based on similar
    techniques Richard's patch showed. I think the best thing there would be to
    introduce synccond_t infra (pthread_cond_t kind of infra for syncops)
    which I am planning to implement for future releases.
    
    3) Based on 1), 2) and the fact that afr already does retries of the
    indices in a loop I removed retries again in the threads.
    
    4) After the refactor, the changes required to bring in multi-threaded
    self-heal for ec would just be ~10 lines, most of it will be about
    options initialization.
    
    Our tests found that we are able to easily saturate network :-).
    
    High level description of the final feature:
    Traditionally self-heal daemon reads the indices (gfids) that need to be healed
    from the brick and initiates heal one gfid at a time. Goal of this feature is
    to add parallelization to the way we do self-heals in a way we do not regress
    in any case but increase parallelization wherever we can. As part of this following
    knobs are introduced to improve parallelization:
    1) We can launch 'max-jobs' number of heals in parallel.
    2) We can keep reading indices as long as the wait-q for heals doesn't go over
       'max-qlen' passed as arguments to multi-threaded dir_scan.
    
    As a first cut, we always do healing of directories in serial order one at a time
    but for files we launch heals in parallel. In future we can do name-heals of dir
    in parallel, but this is not implemented as of now. Reason for this is mentioned
    already in '2)' above.
    
    AFR/EC can introduce options like max-shd-threads/wait-qlength which can be set
    by users to increase the rate of heals when they want. Please note that the
    options will take effect only for the next crawl.
    
    BUG: 1221737
    Change-Id: I8fc0afc334def87797f6d41e309cefc722a317d2
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/13569
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>
    Smoke: Gluster Build System <jenkins.com>

--- Additional comment from Vijay Bellur on 2016-04-04 20:39:54 EDT ---

COMMIT: http://review.gluster.org/13755 committed in master by Jeff Darcy (jdarcy) 
------
commit d65419677cf784599d4352d94f626823f895a18b
Author: Pranith Kumar K <pkarampu>
Date:   Thu Mar 17 09:32:17 2016 +0530

    cluster/afr: Use parallel dir scan functionality
    
    BUG: 1221737
    Change-Id: I0ed71a72f0e33bd733723e00a01cf28378c5534e
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/13755
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 1 Vijay Bellur 2016-04-11 11:30:51 UTC

REVIEW: http://review.gluster.org/13967 (syncop: Add parallel dir scan functionality) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)

Comment 2 Vijay Bellur 2016-04-14 19:28:27 UTC

REVIEW: http://review.gluster.org/13967 (syncop: Add parallel dir scan functionality) posted (#2) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)

Comment 3 Vijay Bellur 2016-04-17 01:56:19 UTC

COMMIT: http://review.gluster.org/13967 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 02a235b5a5fcfffd17debfbf3fceeddffe171682
Author: Pranith Kumar K <pkarampu>
Date:   Thu Mar 17 09:32:02 2016 +0530

    syncop: Add parallel dir scan functionality
    
    Most of this functionality's ideas are contributed
    by Richard Wareing, in his patch:
    https://bugzilla.redhat.com/show_bug.cgi?id=1221737#c1
    
    VERY BIG thanks to him :-).
    
    After starting porting/testing the patch above, I found a few things we can
    improve in this patch based on the results we got in testing.
    1) We are reading all the indices before we launch self-heals. In some customer
    cases I worked on there were almost 5million files/directories that needed
    heal. With such a big number self-heal daemon will be OOM killed if we go
    this route. So I modified this to launch heals based on a queue length
    limit.
    
    2) We found that for directory hierarchies, multi-threaded self-heal
    patch was not giving better results compared to single-threaded
    self-heal because of the order problems. We improved index xlator to
    give gfid type to make sure that all directories in the indices are
    healed before the files that follow in that iteration of readdir
    output(http://review.gluster.org/13553). In our testing this lead to
    zero errors of self-heals as we were only doing self-heals in parallel
    for files and not directories. I think we can further improve self-heal
    speed for directories by doing name heals in parallel based on similar
    techniques Richard's patch showed. I think the best thing there would be to
    introduce synccond_t infra (pthread_cond_t kind of infra for syncops)
    which I am planning to implement for future releases.
    
    3) Based on 1), 2) and the fact that afr already does retries of the
    indices in a loop I removed retries again in the threads.
    
    4) After the refactor, the changes required to bring in multi-threaded
    self-heal for ec would just be ~10 lines, most of it will be about
    options initialization.
    
    Our tests found that we are able to easily saturate network :-).
    
    High level description of the final feature:
    Traditionally self-heal daemon reads the indices (gfids) that need to be healed
    from the brick and initiates heal one gfid at a time. Goal of this feature is
    to add parallelization to the way we do self-heals in a way we do not regress
    in any case but increase parallelization wherever we can. As part of this following
    knobs are introduced to improve parallelization:
    1) We can launch 'max-jobs' number of heals in parallel.
    2) We can keep reading indices as long as the wait-q for heals doesn't go over
       'max-qlen' passed as arguments to multi-threaded dir_scan.
    
    As a first cut, we always do healing of directories in serial order one at a time
    but for files we launch heals in parallel. In future we can do name-heals of dir
    in parallel, but this is not implemented as of now. Reason for this is mentioned
    already in '2)' above.
    
    AFR/EC can introduce options like max-shd-threads/wait-qlength which can be set
    by users to increase the rate of heals when they want. Please note that the
    options will take effect only for the next crawl.
    
     >BUG: 1221737
     >Change-Id: I8fc0afc334def87797f6d41e309cefc722a317d2
     >Signed-off-by: Pranith Kumar K <pkarampu>
     >Reviewed-on: http://review.gluster.org/13569
     >NetBSD-regression: NetBSD Build System <jenkins.org>
     >CentOS-regression: Gluster Build System <jenkins.com>
     >Reviewed-by: Jeff Darcy <jdarcy>
     >Smoke: Gluster Build System <jenkins.com>
    
    BUG: 1325857
    Change-Id: I23235bbb923208eee6a8be711bbfb14350edb11b
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/13967
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 4 Vijay Bellur 2016-04-17 02:53:02 UTC

REVIEW: http://review.gluster.org/14010 (cluster/afr: Use parallel dir scan functionality) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)

Comment 5 Vijay Bellur 2016-04-17 14:11:27 UTC

COMMIT: http://review.gluster.org/14010 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 80fd2a0d8b3da20755a38195f62fc4d7fc5f7b52
Author: Pranith Kumar K <pkarampu>
Date:   Thu Mar 17 09:32:17 2016 +0530

    cluster/afr: Use parallel dir scan functionality
    
     >BUG: 1221737
     >Change-Id: I0ed71a72f0e33bd733723e00a01cf28378c5534e
     >Signed-off-by: Pranith Kumar K <pkarampu>
     >Reviewed-on: http://review.gluster.org/13755
     >Reviewed-on: http://review.gluster.org/13992
     >NetBSD-regression: NetBSD Build System <jenkins.org>
     >CentOS-regression: Gluster Build System <jenkins.com>
     >Smoke: Gluster Build System <jenkins.com>
     >Reviewed-by: Jeff Darcy <jdarcy>
    
    BUG: 1325857
    Change-Id: I7c6b2ea065edd7f5dafffeb42fd6c601b4ab8d14
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14010
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 6 Vijay Bellur 2016-04-18 07:05:21 UTC

REVIEW: http://review.gluster.org/14017 (op-version: Bump up op-version to 3.7.12) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)

Comment 7 Vijay Bellur 2016-04-19 05:23:32 UTC

COMMIT: http://review.gluster.org/14017 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit e4aef8290e8aac8d7fa345db8703a9c3f95a9f66
Author: Pranith Kumar K <pkarampu>
Date:   Mon Apr 18 12:28:34 2016 +0530

    op-version: Bump up op-version to 3.7.12
    
    BUG: 1325857
    Change-Id: I49286ba60281d543f2acacf45c4f824627ef4167
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14017
    Smoke: Gluster Build System <jenkins.com>
    Reviewed-by: Krutika Dhananjay <kdhananj>
    CentOS-regression: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 8 Kaushal 2016-06-28 12:14:18 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user