1171954 – [RFE] Rebalance Performance Improvements

Bug 1171954 - [RFE] Rebalance Performance Improvements

Summary: [RFE] Rebalance Performance Improvements

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Susant Kumar Palai
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1175214 1217381
TreeView+	depends on / blocked

Reported:	2014-12-09 05:16 UTC by Nithya Balachandran
Modified:	2016-07-11 14:24 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.8.0
Clone Of:
Clones:	1175214 1217381 (view as bug list)
Environment:
Last Closed:	2016-07-11 14:24:23 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nithya Balachandran 2014-12-09 05:16:25 UTC

Description of problem:

DHT Rebalance performance improvements

This bug is an umbrella bug for the DHT rebalance performance improvements in 3.7

Comment 1 Nagaprasad Sathyanarayana 2015-04-13 09:32:21 UTC

http://review.gluster.org/9657

Comment 2 Anand Avati 2015-04-23 12:40:12 UTC

REVIEW: http://review.gluster.org/9657 (rebalance: Introducing local crawl and parallel migration) posted (#18) for review on master by Susant Palai (spalai)

Comment 3 Anand Avati 2015-04-27 19:24:07 UTC

REVIEW: http://review.gluster.org/9657 (rebalance: Introducing local crawl and parallel migration) posted (#19) for review on master by Shyamsundar Ranganathan (srangana)

Comment 4 Anand Avati 2015-04-28 21:13:29 UTC

REVIEW: http://review.gluster.org/9657 (rebalance: Introducing local crawl and parallel migration) posted (#20) for review on master by Susant Palai (spalai)

Comment 5 Anand Avati 2015-04-29 13:48:03 UTC

COMMIT: http://review.gluster.org/9657 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit b3a966c241b5d5b8117f06a4c744c18b6a59bb18
Author: Susant Palai <spalai>
Date:   Sun Apr 12 15:55:02 2015 +0530

    rebalance: Introducing local crawl and parallel migration
    
    The current patch address two part of the design proposed.
    1. Rebalance multiple files in parallel
    2. Crawl only bricks that belong to the current node
    
    Brief design explanation for the above two points.
    
    1. Rebalance multiple files in parallel:
       -------------------------------------
    The existing rebalance engine is single threaded. Hence, introduced
    multiple threads which will be running parallel to the crawler. The
    current rebalance migration is converted to a "Producer-Consumer"
    frame work.
    
    Where Producer is : Crawler
          Consumer is : Migrating Threads
    
    Crawler: Crawler is the main thread. The job of the crawler is now
    limited to fix-layout of each directory and add the files which are
    eligible for the migration to a global queue in a round robin manner
    so that we will use all the disk resources efficiently. Hence, the
    crawler will not be "blocked" by migration process.
    
    Producer: Producer will monitor the global queue. If any file is
    added to this queue, it will dqueue that entry and migrate the file.
    Currently 20 migration threads are spawned at the beginning of the
    rebalance process. Hence, multiple file migration happens in parallel.
    
    2. Crawl only bricks that belong to the current node:
       --------------------------------------------------
    As rebalance process is spawned per node, it migrates only the files
    that belongs to it's own node for the sake of load balancing. But it
    also reads entries from the whole cluster, which is not necessary as
    readdir hits other nodes.
    
    New Design:
            As part of the new design the rebalancer decides the subvols
    that are local to the rebalancer node by checking the node-uuid of
    root directory prior to the crawler starts. Hence, readdir won't hit
    the whole cluster  as it has already the context of local subvols and
    also node-uuid request for each file can be avoided. This makes the
    rebalance process "more scalable".
    
    Change-Id: I73ed6ff807adea15086eabbb8d9883e88571ebc1
    BUG: 1171954
    Signed-off-by: Susant Palai <spalai>
    Reviewed-on: http://review.gluster.org/9657
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: N Balachandran <nbalacha>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 6 Anand Avati 2015-04-30 16:18:38 UTC

REVIEW: http://review.gluster.org/10478 (dht: Fix invalid dereference in gf_defrag_start_crawl) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 7 Nithya Balachandran 2015-05-08 10:30:28 UTC

http://review.gluster.org/#/c/10478/ has been abandoned and the issue it addresses is addressed in BZ 1217949.

Moving this BZ to Modified.

Comment 8 Niels de Vos 2016-06-23 08:15:37 UTC

Hi Nithya, could you check and correct the status of this bug?

Thanks!
Niels

Comment 9 Nithya Balachandran 2016-07-11 14:11:54 UTC

hi Niels,

The BZ status is correct - or maybe it should be closed as the release with this fix went out a while ago.  The fix went in with 
http://review.gluster.org/9657


Thanks,
Nithya

Comment 10 Niels de Vos 2016-07-11 14:24:23 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.