1420166 – The rebal-throttle setting does not work as expected

Bug 1420166 - The rebal-throttle setting does not work as expected

Summary: The rebal-throttle setting does not work as expected

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Susant Kumar Palai
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1381142
Blocks:	1473134
TreeView+	depends on / blocked

Reported:	2017-02-08 02:11 UTC by Susant Kumar Palai
Modified:	2017-07-20 05:54 UTC (History)
CC List:	14 users (show)
Fixed In Version:	glusterfs-3.11.0
Clone Of:	1381142
Clones:	1473134 (view as bug list)
Environment:
Last Closed:	2017-05-30 18:40:54 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Worker Ant 2017-02-08 04:04:04 UTC

REVIEW: https://review.gluster.org/16427 (dht/cluster: rebalance perf enhancement) posted (#2) for review on master by Susant Palai (spalai)

Comment 2 Worker Ant 2017-02-08 04:19:18 UTC

REVIEW: https://review.gluster.org/16427 (dht/cluster: rebalance perf enhancement) posted (#3) for review on master by Susant Palai (spalai)

Comment 3 Worker Ant 2017-02-10 06:16:58 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#4) for review on master by Susant Palai (spalai)

Comment 4 Worker Ant 2017-02-20 07:12:52 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#5) for review on master by Susant Palai (spalai)

Comment 5 Worker Ant 2017-02-20 07:13:54 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#6) for review on master by Susant Palai (spalai)

Comment 6 Worker Ant 2017-02-24 07:35:47 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#7) for review on master by Susant Palai (spalai)

Comment 7 Worker Ant 2017-02-28 06:56:14 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#8) for review on master by Susant Palai (spalai)

Comment 8 Worker Ant 2017-02-28 07:01:14 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#9) for review on master by Susant Palai (spalai)

Comment 9 Worker Ant 2017-03-10 08:48:33 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#10) for review on master by Susant Palai (spalai)

Comment 10 Worker Ant 2017-04-21 07:23:40 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#11) for review on master by Susant Palai (spalai)

Comment 11 Worker Ant 2017-04-21 07:27:47 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#12) for review on master by Susant Palai (spalai)

Comment 12 Worker Ant 2017-04-25 14:18:55 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#13) for review on master by Susant Palai (spalai)

Comment 13 Worker Ant 2017-04-27 06:55:52 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#14) for review on master by Susant Palai (spalai)

Comment 14 Worker Ant 2017-04-27 10:41:52 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#15) for review on master by Susant Palai (spalai)

Comment 15 Worker Ant 2017-04-28 06:24:26 UTC

REVIEW: https://review.gluster.org/16427 (cluster/dht: rebalance perf enhancement) posted (#16) for review on master by Susant Palai (spalai)

Comment 16 Worker Ant 2017-04-29 14:28:23 UTC

COMMIT: https://review.gluster.org/16427 committed in master by Raghavendra G (rgowdapp) 
------
commit bff6b7b1d75b55bfdc11a6aac613b51bdafee989
Author: Susant Palai <spalai>
Date:   Tue Jan 10 16:11:50 2017 +0530

    cluster/dht: rebalance perf enhancement
    
    Problem: Throttle settings "normal" and "aggressive" for rebalance
    did not have performance difference.
    
    normal mode spawns $(no. of cores - 4)/2 threads and aggressive
    spawns $(no. of cores - 4) threads. Though aggressive mode has twice
    the number of threads compared to that of normal mode, there was no
    performance gain when switched to aggressive mode from normal mode.
    
    RCA:
    During the course of debugging the above problem, we tried assigning
    migration job to migration threads spawned by rebalance, rather than
    synctasks(as there is more overhead associated to manage the task
    queue and threads). This gave us a significant improvement over rebalance
    under synctasks. This patch does not really gurantee that there will be a
    clear performance difference between normal and aggressive mode, but this
    patch certainly maximized the disk utilization for 1GBfiles run.
    
    Results:
    
    Test enviroment:
    Gluster Config:
    Number of Bricks: 2 (one brick per disk(RAID-6 12 disk))
    Bricks:
    Brick1: server1:/brick/test1/1
    Brick2: server2:/brick/test1/1
    Options Reconfigured:
    performance.readdir-ahead: on
    server.event-threads: 4
    client.event-threads: 4
    
    1000 files with 1GB each were created/renamed such that all files will have
    server1 as cached and server2 as hashed, so that all files will be migrated.
    
    Test machines had 24 cores each.
    
    Results  with/without synctask based migration:
    -----------------------------------------------
    
    mode                    normal(10threads)          aggressive(20threads)
    
    timetaken               0:55:30 (h:m:s)            0:56:3 (h:m:s)
    withsynctask
    
    timetaken
    with migrator           0:38:3 (h:m:s)             0:23:41 (h:m:s)
    threads
    
    From above table it can be seen that, there is a clear 2x perf gain between
    rebalance with synctask vs rebalance with migrator threads.
    
    Additionally this patch modifies the code so that caller will have the exact error
    number returned by dht_migrate_file(earlier the errno meaning was overloaded). This
    will help avoiding scenarios where migration failure due to ENOENT, can result in
    rebalance abort/failure.
    
    Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e
    BUG: 1420166
    Signed-off-by: Susant Palai <spalai>
    Reviewed-on: https://review.gluster.org/16427
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Raghavendra G <rgowdapp>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 17 Shyamsundar 2017-05-30 18:40:54 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.