Bug 1221737 - Multi-threaded SHD support
Summary: Multi-threaded SHD support
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: All
OS: All
high
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1314724 1325857
TreeView+ depends on / blocked
 
Reported: 2015-05-14 17:06 UTC by rwareing
Modified: 2016-06-16 13:01 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Enhancement
Doc Text:
Feature: Multi-threaded SHD support Reason: Multi-threading support is critical for two important use-cases: Halo replication (separate patch) - Long distance replication are high latency and parallel healing is required for performance. Use higher (16-32 threads for such use-cases). Traditional clusters where bricks are being healed from scratch w/ large numbers of small files (4-8 threads should be sufficient for these use-cases). The net result is anywhere from 2-30x SHD performance depending on how many threads you use and what kind of storage hardware you have backing your bricks. For bricks with large numbers of small files, the effect is especially dramatic. NOTES: It's critical to ensure your bricks have a sufficient number of threads available via the performance.io-thread-count volume options. Based on my tests sizing this to 2x the number of SHD threads is a good place to start. Failure to do so can DOS your bricks with SHD requests. Result: 2-30x small file healing perf improvement.
Clone Of:
: 1314724 1325857 (view as bug list)
Environment:
Last Closed: 2016-06-16 13:01:11 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
Patch to add multi-threaded SHD support to v3.6.x of GlusterFS. (47.15 KB, patch)
2015-05-14 17:21 UTC, rwareing
no flags Details | Diff

Description rwareing 2015-05-14 17:06:46 UTC
Multi-threading support is critical for two important use-cases:

Halo replication (separate patch) - Long distance replication are high
latency and parallel healing is required for performance. Use higher
(16-32 threads for such use-cases).

Traditional clusters where bricks are being healed from scratch w/ large
numbers of small files (4-8 threads should be sufficient for these
use-cases).

The net result is anywhere from 2-30x SHD performance depending on how
many threads you use and what kind of storage hardware you have backing
your bricks. For bricks with large numbers of small files, the effect is
especially dramatic.

NOTES: It's critical to ensure your bricks have a sufficient number of
threads available via the performance.io-thread-count volume options.
Based on my tests sizing this to 2x the number of SHD threads is a good
place to start. Failure to do so can DOS your bricks with SHD requests.

Comment 1 rwareing 2015-05-14 17:21:58 UTC
Created attachment 1025537 [details]
Patch to add multi-threaded SHD support to v3.6.x of GlusterFS.

Comment 2 Mohammed Rafi KC 2015-05-19 13:14:38 UTC
Thank you Richard!

Comment 3 Anand Avati 2015-05-20 15:02:55 UTC
REVIEW: http://review.gluster.org/10851 (Multi-threaded SHD support) posted (#1) for review on master by Kaushal M (kaushal@redhat.com)

Comment 4 Anand Avati 2015-08-28 07:07:46 UTC
REVIEW: http://review.gluster.org/10851 (Multi-threaded SHD support) posted (#2) for review on master by Kaushal M (kaushal@redhat.com)

Comment 5 Vijay Bellur 2016-03-17 05:32:39 UTC
REVIEW: http://review.gluster.org/13569 (syncop: Add parallel dir scan functionality) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 6 Vijay Bellur 2016-03-17 05:32:44 UTC
REVIEW: http://review.gluster.org/13755 (cluster/afr: Use parallel dir scan functionality) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 7 Vijay Bellur 2016-03-17 06:28:15 UTC
REVIEW: http://review.gluster.org/13755 (cluster/afr: Use parallel dir scan functionality) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 8 Vijay Bellur 2016-03-17 06:28:19 UTC
REVIEW: http://review.gluster.org/13569 (syncop: Add parallel dir scan functionality) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 9 Vijay Bellur 2016-04-04 05:11:14 UTC
REVIEW: http://review.gluster.org/13755 (cluster/afr: Use parallel dir scan functionality) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 10 Vijay Bellur 2016-04-04 05:11:18 UTC
REVIEW: http://review.gluster.org/13569 (syncop: Add parallel dir scan functionality) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 11 Vijay Bellur 2016-04-05 00:36:44 UTC
COMMIT: http://review.gluster.org/13569 committed in master by Jeff Darcy (jdarcy@redhat.com) 
------
commit c76a1690bbd909b1c2dd2c495e2a8352d599b14b
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Thu Mar 17 09:32:02 2016 +0530

    syncop: Add parallel dir scan functionality
    
    Most of this functionality's ideas are contributed
    by Richard Wareing, in his patch:
    https://bugzilla.redhat.com/show_bug.cgi?id=1221737#c1
    
    VERY BIG thanks to him :-).
    
    After starting porting/testing the patch above, I found a few things we can
    improve in this patch based on the results we got in testing.
    1) We are reading all the indices before we launch self-heals. In some customer
    cases I worked on there were almost 5million files/directories that needed
    heal. With such a big number self-heal daemon will be OOM killed if we go
    this route. So I modified this to launch heals based on a queue length
    limit.
    
    2) We found that for directory hierarchies, multi-threaded self-heal
    patch was not giving better results compared to single-threaded
    self-heal because of the order problems. We improved index xlator to
    give gfid type to make sure that all directories in the indices are
    healed before the files that follow in that iteration of readdir
    output(http://review.gluster.org/13553). In our testing this lead to
    zero errors of self-heals as we were only doing self-heals in parallel
    for files and not directories. I think we can further improve self-heal
    speed for directories by doing name heals in parallel based on similar
    techniques Richard's patch showed. I think the best thing there would be to
    introduce synccond_t infra (pthread_cond_t kind of infra for syncops)
    which I am planning to implement for future releases.
    
    3) Based on 1), 2) and the fact that afr already does retries of the
    indices in a loop I removed retries again in the threads.
    
    4) After the refactor, the changes required to bring in multi-threaded
    self-heal for ec would just be ~10 lines, most of it will be about
    options initialization.
    
    Our tests found that we are able to easily saturate network :-).
    
    High level description of the final feature:
    Traditionally self-heal daemon reads the indices (gfids) that need to be healed
    from the brick and initiates heal one gfid at a time. Goal of this feature is
    to add parallelization to the way we do self-heals in a way we do not regress
    in any case but increase parallelization wherever we can. As part of this following
    knobs are introduced to improve parallelization:
    1) We can launch 'max-jobs' number of heals in parallel.
    2) We can keep reading indices as long as the wait-q for heals doesn't go over
       'max-qlen' passed as arguments to multi-threaded dir_scan.
    
    As a first cut, we always do healing of directories in serial order one at a time
    but for files we launch heals in parallel. In future we can do name-heals of dir
    in parallel, but this is not implemented as of now. Reason for this is mentioned
    already in '2)' above.
    
    AFR/EC can introduce options like max-shd-threads/wait-qlength which can be set
    by users to increase the rate of heals when they want. Please note that the
    options will take effect only for the next crawl.
    
    BUG: 1221737
    Change-Id: I8fc0afc334def87797f6d41e309cefc722a317d2
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/13569
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.com>

Comment 12 Vijay Bellur 2016-04-05 00:39:54 UTC
COMMIT: http://review.gluster.org/13755 committed in master by Jeff Darcy (jdarcy@redhat.com) 
------
commit d65419677cf784599d4352d94f626823f895a18b
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Thu Mar 17 09:32:17 2016 +0530

    cluster/afr: Use parallel dir scan functionality
    
    BUG: 1221737
    Change-Id: I0ed71a72f0e33bd733723e00a01cf28378c5534e
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/13755
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Jeff Darcy <jdarcy@redhat.com>

Comment 13 Vijay Bellur 2016-04-13 15:43:22 UTC
REVIEW: http://review.gluster.org/13992 (mgmt/glusterd: Change op-version for max-threads, shd-wait-qlength) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 14 Vijay Bellur 2016-04-14 08:24:17 UTC
COMMIT: http://review.gluster.org/13992 committed in master by Atin Mukherjee (amukherj@redhat.com) 
------
commit 4910caece70d5c3e28453174b990d2b764359e9a
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Wed Apr 13 21:10:22 2016 +0530

    mgmt/glusterd: Change op-version for max-threads, shd-wait-qlength
    
    Change-Id: I0e2dcacfe0804737d2cff76d2a0ee51a520ccec2
    BUG: 1221737
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/13992
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>

Comment 15 Niels de Vos 2016-06-16 13:01:11 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.