Bug 1473129 - dht/rebalance: Improve rebalance crawl performance
Summary: dht/rebalance: Improve rebalance crawl performance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.10
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Susant Kumar Palai
QA Contact:
URL:
Whiteboard:
Depends On: 1439571
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-20 05:28 UTC by Susant Kumar Palai
Modified: 2017-08-21 13:41 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.10.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1439571
Environment:
Last Closed: 2017-08-21 13:41:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Susant Kumar Palai 2017-07-20 05:28:12 UTC
+++ This bug was initially created as a clone of Bug #1439571 +++

Description of problem:
  This bug is created to track development for rebalance crawl performance.

 The job of the crawler in rebalance is to fetch files from each local subvolume and push them to migration queue if it is eligible for migration. And we do a lookup on the entries received to figure out the eligibilty. Since, the lookup done is on a local subvolume we receive linkto files and regular files as well. 

Rebalance currently does two lookups to separate linkto files from data files. Instead it can get the linkto file information(both attr and xattr) from readdirp which will remove one lookup cost. And the other lookup on the data file can be off loaded to migrator threads.

--- Additional comment from Worker Ant on 2017-04-06 14:11:07 MVT ---

REVIEW: https://review.gluster.org/15781 (dht/rebalance: Crawler performance improvement) posted (#6) for review on master by Susant Palai (spalai)

--- Additional comment from Worker Ant on 2017-04-06 14:16:23 MVT ---

REVIEW: https://review.gluster.org/15781 (dht/rebalance: Crawler performance improvement) posted (#7) for review on master by Susant Palai (spalai)

--- Additional comment from Worker Ant on 2017-04-10 09:20:06 MVT ---

COMMIT: https://review.gluster.org/15781 committed in master by Raghavendra G (rgowdapp) 
------
commit 656bf04955936319de4b8711debcc9931a7c778e
Author: Susant Palai <spalai>
Date:   Mon Nov 7 12:00:13 2016 +0530

    dht/rebalance: Crawler performance improvement
    
     The job of the crawler in rebalance is to fetch files from each
    local subvolume and push them to migration queue if it is eligible for
    migration. And we do a lookup on the entries received to figure out the
    eligibilty. Since, the lookup done is on a local subvolume we receive
    linkto files and regular files as well. This requires us to do two lookups.
    
    first: do a lookup on the file to figure out whether it is a linkto file
    second: do a lookup on the file to figure out if it should be migrated
    
    Note: The migrator thread also does one lookup for the file before
    migration.
    
    Optimization: Remove the lookup done by the crawler. Offload these task
    to the migrator threads. For linkto file verification get the stat and
    xattr information from readdirp.
    
    So in total we have one lookup instead of three for each entry.
    
    Performance numbers:
    Create two node, two brick setup. Created 100000 files. And started
    rebalance. Since, there is no add-brick, no files will be migrated and
    we will get the crawler performance.
    
    Without patch:
    [root@gprfs039 ~]# grs
                                        Node Rebalanced-files          size
    scanned      failures       skipped               status  run time in
    h:m:s
                                   ---------      -----------   -----------
    -----------   -----------   -----------         ------------
    --------------
                                   localhost                0        0Bytes
    50070             0             0            completed        0:0:48
                                server2                0        0Bytes
    49930             0             0            completed        0:0:44
    volume rebalance: test1: success
    
    Total: 48 seconds
    
    WiththecurrentPatch:
    [root@gprfs039 mnt]# gluster v rebalance test1 status
                                        Node Rebalanced-files          size
    scanned      failures       skipped               status  run time in
    h:m:s
                                   ---------      -----------   -----------
    -----------   -----------   -----------         ------------
    --------------
                                   localhost                0        0Bytes
    50070             0             0            completed        0:0:12
                                server2                0        0Bytes
    49930             0             0            completed        0:0:12
    volume rebalance: test1: success
    
    Total: 12 seconds
    
    That's 4X speed gain. :)
    
    Updates glusterfs#155
    Change-Id: Idc8e5b366e76c54aa40d698876ae62fe1630b6cc
    BUG: 1439571
    Signed-off-by: Susant Palai <spalai>
    Reviewed-on: https://review.gluster.org/15781
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

--- Additional comment from Shyamsundar on 2017-05-30 23:49:22 MVT ---

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 1 Worker Ant 2017-07-20 06:40:01 UTC
REVIEW: https://review.gluster.org/17830 (dht/rebalance: Crawler performance improvement) posted (#1) for review on release-3.10 by Susant Palai (spalai)

Comment 2 Worker Ant 2017-07-20 10:09:16 UTC
REVIEW: https://review.gluster.org/17830 (dht/rebalance: Crawler performance improvement) posted (#2) for review on release-3.10 by Susant Palai (spalai)

Comment 3 Worker Ant 2017-08-11 11:16:50 UTC
REVIEW: https://review.gluster.org/17830 (dht/rebalance: Crawler performance improvement) posted (#3) for review on release-3.10 by Shyamsundar Ranganathan (srangana)

Comment 4 Worker Ant 2017-08-11 11:32:40 UTC
REVIEW: https://review.gluster.org/17830 (dht/rebalance: Crawler performance improvement) posted (#4) for review on release-3.10 by Shyamsundar Ranganathan (srangana)

Comment 5 Worker Ant 2017-08-11 11:48:58 UTC
COMMIT: https://review.gluster.org/17830 committed in release-3.10 by Shyamsundar Ranganathan (srangana) 
------
commit a97d446dd52d739cbfc2e7689155ca1f6641ced6
Author: Susant Palai <spalai>
Date:   Mon Nov 7 12:00:13 2016 +0530

    dht/rebalance: Crawler performance improvement
    
     The job of the crawler in rebalance is to fetch files from each
    local subvolume and push them to migration queue if it is eligible for
    migration. And we do a lookup on the entries received to figure out the
    eligibilty. Since, the lookup done is on a local subvolume we receive
    linkto files and regular files as well. This requires us to do two lookups.
    
    first: do a lookup on the file to figure out whether it is a linkto file
    second: do a lookup on the file to figure out if it should be migrated
    
    Note: The migrator thread also does one lookup for the file before
    migration.
    
    Optimization: Remove the lookup done by the crawler. Offload these task
    to the migrator threads. For linkto file verification get the stat and
    xattr information from readdirp.
    
    So in total we have one lookup instead of three for each entry.
    
    Performance numbers:
    Create two node, two brick setup. Created 100000 files. And started
    rebalance. Since, there is no add-brick, no files will be migrated and
    we will get the crawler performance.
    
    Without patch:
    [root@gprfs039 ~]# grs
                                        Node Rebalanced-files          size
    scanned      failures       skipped               status  run time in
    h:m:s
                                   ---------      -----------   -----------
    -----------   -----------   -----------         ------------
    --------------
                                   localhost                0        0Bytes
    50070             0             0            completed        0:0:48
                                server2                0        0Bytes
    49930             0             0            completed        0:0:44
    volume rebalance: test1: success
    
    Total: 48 seconds
    
    WiththecurrentPatch:
    [root@gprfs039 mnt]# gluster v rebalance test1 status
                                        Node Rebalanced-files          size
    scanned      failures       skipped               status  run time in
    h:m:s
                                   ---------      -----------   -----------
    -----------   -----------   -----------         ------------
    --------------
                                   localhost                0        0Bytes
    50070             0             0            completed        0:0:12
                                server2                0        0Bytes
    49930             0             0            completed        0:0:12
    volume rebalance: test1: success
    
    Total: 12 seconds
    
    That's 4X speed gain. :)
    
    > Updates glusterfs#155
    > Change-Id: Idc8e5b366e76c54aa40d698876ae62fe1630b6cc
    > BUG: 1439571
    > Signed-off-by: Susant Palai <spalai>
    > Reviewed-on: https://review.gluster.org/15781
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Raghavendra G <rgowdapp>
    
    Updates glusterfs#155
    Change-Id: Idc8e5b366e76c54aa40d698876ae62fe1630b6cc
    BUG: 1473129
    Signed-off-by: Susant Palai <spalai>
    Reviewed-on: https://review.gluster.org/17830
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 6 Shyamsundar 2017-08-21 13:41:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.5, please open a new bug report.

glusterfs-3.10.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-August/000079.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.