+++ This bug was initially created as a clone of Bug #1171954 +++ Description of problem: DHT Rebalance performance improvements This bug is an umbrella bug for the DHT rebalance performance improvements in 3.7 --- Additional comment from Nagaprasad Sathyanarayana on 2015-04-13 14:32:21 MVT --- http://review.gluster.org/9657 --- Additional comment from Anand Avati on 2015-04-23 17:40:12 MVT --- REVIEW: http://review.gluster.org/9657 (rebalance: Introducing local crawl and parallel migration) posted (#18) for review on master by Susant Palai (spalai) --- Additional comment from Anand Avati on 2015-04-28 00:24:07 MVT --- REVIEW: http://review.gluster.org/9657 (rebalance: Introducing local crawl and parallel migration) posted (#19) for review on master by Shyamsundar Ranganathan (srangana) --- Additional comment from Anand Avati on 2015-04-29 02:13:29 MVT --- REVIEW: http://review.gluster.org/9657 (rebalance: Introducing local crawl and parallel migration) posted (#20) for review on master by Susant Palai (spalai) --- Additional comment from Anand Avati on 2015-04-29 18:48:03 MVT --- COMMIT: http://review.gluster.org/9657 committed in master by Shyamsundar Ranganathan (srangana) ------ commit b3a966c241b5d5b8117f06a4c744c18b6a59bb18 Author: Susant Palai <spalai> Date: Sun Apr 12 15:55:02 2015 +0530 rebalance: Introducing local crawl and parallel migration The current patch address two part of the design proposed. 1. Rebalance multiple files in parallel 2. Crawl only bricks that belong to the current node Brief design explanation for the above two points. 1. Rebalance multiple files in parallel: ------------------------------------- The existing rebalance engine is single threaded. Hence, introduced multiple threads which will be running parallel to the crawler. The current rebalance migration is converted to a "Producer-Consumer" frame work. Where Producer is : Crawler Consumer is : Migrating Threads Crawler: Crawler is the main thread. The job of the crawler is now limited to fix-layout of each directory and add the files which are eligible for the migration to a global queue in a round robin manner so that we will use all the disk resources efficiently. Hence, the crawler will not be "blocked" by migration process. Producer: Producer will monitor the global queue. If any file is added to this queue, it will dqueue that entry and migrate the file. Currently 20 migration threads are spawned at the beginning of the rebalance process. Hence, multiple file migration happens in parallel. 2. Crawl only bricks that belong to the current node: -------------------------------------------------- As rebalance process is spawned per node, it migrates only the files that belongs to it's own node for the sake of load balancing. But it also reads entries from the whole cluster, which is not necessary as readdir hits other nodes. New Design: As part of the new design the rebalancer decides the subvols that are local to the rebalancer node by checking the node-uuid of root directory prior to the crawler starts. Hence, readdir won't hit the whole cluster as it has already the context of local subvols and also node-uuid request for each file can be avoided. This makes the rebalance process "more scalable". Change-Id: I73ed6ff807adea15086eabbb8d9883e88571ebc1 BUG: 1171954 Signed-off-by: Susant Palai <spalai> Reviewed-on: http://review.gluster.org/9657 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: N Balachandran <nbalacha> Reviewed-by: Shyamsundar Ranganathan <srangana>
REVIEW: http://review.gluster.org/10466 (rebalance: Introducing local crawl and parallel migration) posted (#1) for review on release-3.7 by Susant Palai (spalai)
REVIEW: http://review.gluster.org/10466 (rebalance: Introducing local crawl and parallel migration) posted (#2) for review on release-3.7 by Susant Palai (spalai)
REVIEW: http://review.gluster.org/10466 (rebalance: Introducing local crawl and parallel migration) posted (#3) for review on release-3.7 by Susant Palai (spalai)
REVIEW: http://review.gluster.org/10466 (rebalance: Introducing local crawl and parallel migration) posted (#4) for review on release-3.7 by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/10466 committed in release-3.7 by Vijay Bellur (vbellur) ------ commit 579186aeba940e3ec73093c48e17b5f6f94910d0 Author: Susant Palai <spalai> Date: Sun Apr 12 15:55:02 2015 +0530 rebalance: Introducing local crawl and parallel migration The current patch address two part of the design proposed. 1. Rebalance multiple files in parallel 2. Crawl only bricks that belong to the current node Brief design explanation for the above two points. 1. Rebalance multiple files in parallel: ------------------------------------- The existing rebalance engine is single threaded. Hence, introduced multiple threads which will be running parallel to the crawler. The current rebalance migration is converted to a "Producer-Consumer" frame work. Where Producer is : Crawler Consumer is : Migrating Threads Crawler: Crawler is the main thread. The job of the crawler is now limited to fix-layout of each directory and add the files which are eligible for the migration to a global queue in a round robin manner so that we will use all the disk resources efficiently. Hence, the crawler will not be "blocked" by migration process. Producer: Producer will monitor the global queue. If any file is added to this queue, it will dqueue that entry and migrate the file. Currently 20 migration threads are spawned at the beginning of the rebalance process. Hence, multiple file migration happens in parallel. 2. Crawl only bricks that belong to the current node: -------------------------------------------------- As rebalance process is spawned per node, it migrates only the files that belongs to it's own node for the sake of load balancing. But it also reads entries from the whole cluster, which is not necessary as readdir hits other nodes. New Design: As part of the new design the rebalancer decides the subvols that are local to the rebalancer node by checking the node-uuid of root directory prior to the crawler starts. Hence, readdir won't hit the whole cluster as it has already the context of local subvols and also node-uuid request for each file can be avoided. This makes the rebalance process "more scalable". Change-Id: I6f1b44086a09df8ca23935fd213509c70cc0c050 BUG: 1217381 Signed-off-by: Susant Palai <spalai> Reviewed-on: http://review.gluster.org/10466 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System Reviewed-by: N Balachandran <nbalacha>
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.