REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#1) for review on master by Karthik U S (ksubrahm)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#2) for review on master by Karthik U S (ksubrahm)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#3) for review on master by Karthik U S (ksubrahm)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#4) for review on master by Karthik U S (ksubrahm)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#5) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17144 (cluster/dht: rebalance fixes) posted (#1) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#6) for review on master by Karthik U S (ksubrahm)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#7) for review on master by Karthik U S (ksubrahm)
REVIEW: https://review.gluster.org/17239 (cluster/dht: Rebalance on all nodes migrate files) posted (#1) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17239 (cluster/dht: Rebalance on all nodes should migrate files) posted (#2) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#8) for review on master by Karthik U S (ksubrahm)
REVIEW: https://review.gluster.org/17239 (cluster/dht: Rebalance on all nodes should migrate files) posted (#3) for review on master by N Balachandran (nbalacha)
What are the requisites of the returned list of UUIDs ? I see that they must have a space as a separator, but is the order important ? if a brick is down or known to be in an inconsistent state, should it be returned ? I see that AFR returns a NULL UUID for down bricks. Is this required for bad and/or down bricks or can they simply be omitted ?
Before answering Xavi's questions, here is an overview of the new approach. The current DHT rebalance design is as follows: 1. DHT rebalance runs on all nodes, each of which has a node-uuid 2. On startup, each rebalance process determines the list of local subvols for the node (this is so each rebalance process reduces network reads by migrating only those files on the same node) by getting node-uuids for the brick root and comparing it against its own. As only the node uuid of the first brick is returned, only the node on which the first brick exists will actually have any local subvols. 4. Each rebalance process lists files on its local nodes and migrates them if required. The new approach is as follows: 1. DHT rebalance runs on all nodes, each of which has a node-uuid 2. On startup, each rebalance process determines the list of local subvols for the node (this is so each rebalance process reduces network reads by migrating only those files on the same node) by getting node-uuids for the brick root and comparing it against its own. As all node uuids of every brick in the replica set are returned, all the nodes will now have local subvols. The node-uuids for each locla subvols are saved in an array. 4. Each rebalance process: - lists files on its local nodes, - hashes the file gfid for each file - hash % count of node-uuids to get an index into the node-uuid array. - if the node-uuid at that index matches that of the current process, the process will migrate the file This requires that all node-uuids be saved in the same order in DHT on all nodes otherwise there is no guarantee that a particular file will always be assigned to the same node (we do not want multiple nodes trying to migrate the same file). It makes it easier if AFR/EC can return the uuids in the same order as that of the bricks so it can be extended to use notifications to determine when a brick goes down. If a brick is down, but the node is up, I would say return the node-uuid if available as the other node can still migrate files. I am not sure of how easy it will be to determine if a node is up. A node-uuid must be returned for bad bricks in order to keep the selection algo consistent and allow that node to migrate files.
REVIEW: https://review.gluster.org/17239 (cluster/dht: Rebalance on all nodes should migrate files) posted (#4) for review on master by N Balachandran (nbalacha)
(In reply to Nithya Balachandran from comment #14) > Before answering Xavi's questions, here is an overview of the new approach. > > The current DHT rebalance design is as follows: > > 1. DHT rebalance runs on all nodes, each of which has a node-uuid > 2. On startup, each rebalance process determines the list of local subvols > for the node (this is so each rebalance process reduces network reads by > migrating only those files on the same node) by getting node-uuids for the > brick root and comparing it against its own. As only the node uuid of the > first brick is returned, only the node on which the first brick exists will > actually have any local subvols. > 4. Each rebalance process lists files on its local nodes and migrates them > if required. > > > > The new approach is as follows: > > 1. DHT rebalance runs on all nodes, each of which has a node-uuid > 2. On startup, each rebalance process determines the list of local subvols > for the node (this is so each rebalance process reduces network reads by > migrating only those files on the same node) by getting node-uuids for the > brick root and comparing it against its own. As all node uuids of every > brick in the replica set are returned, all the nodes will now have local > subvols. The node-uuids for each locla subvols are saved in an array. > 4. Each rebalance process: > - lists files on its local nodes, > - hashes the file gfid for each file > - hash % count of node-uuids to get an index into the node-uuid array. > - if the node-uuid at that index matches that of the current process, the > process will migrate the file > > > This requires that all node-uuids be saved in the same order in DHT on all > nodes otherwise there is no guarantee that a particular file will always be > assigned to the same node (we do not want multiple nodes trying to migrate > the same file). > > It makes it easier if AFR/EC can return the uuids in the same order as that > of the bricks so it can be extended to use notifications to determine when a > brick goes down. > > If a brick is down, but the node is up, I would say return the node-uuid if > available as the other node can still migrate files. I am not sure of how > easy it will be to determine if a node is up. At the moment we just give all-zero uuid if getxattr from that brick either fails or it is down from the beginning. > > A node-uuid must be returned for bad bricks in order to keep the selection > algo consistent and allow that node to migrate files. Thanks for the detailed explanation Nitya, I don't think I could have given such a good response :-). Xavi, We are trying to get this in by 17th May, so one of us will pick this up for implementation. I am pretty sure you can send this patch in an hour, so if you have the time to send the patch, just update the bz that you are and we will be happy to get this in. Pranith
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#9) for review on master by Karthik U S (ksubrahm)
Pranith, sorry for the late answer. It has been a really busy weekend... I'll try to make the patch this morning.
REVIEW: https://review.gluster.org/17239 (cluster/dht: Rebalance on all nodes should migrate files) posted (#5) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#1) for review on master by Xavier Hernandez (xhernandez)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#2) for review on master by Xavier Hernandez (xhernandez)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#3) for review on master by Xavier Hernandez (xhernandez)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#4) for review on master by Xavier Hernandez (xhernandez)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#5) for review on master by Xavier Hernandez (xhernandez)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#6) for review on master by Xavier Hernandez (xhernandez)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#7) for review on master by Xavier Hernandez (xhernandez)
REVIEW: https://review.gluster.org/17297 (cluster/ec: return all node uuids from all subvolumes) posted (#8) for review on master by Xavier Hernandez (xhernandez)
COMMIT: https://review.gluster.org/17239 committed in master by Shyamsundar Ranganathan (srangana) ------ commit b23bd3dbc2c153171d0bb1205e6804afe022a55f Author: N Balachandran <nbalacha> Date: Wed May 10 21:26:28 2017 +0530 cluster/dht: Rebalance on all nodes should migrate files Problem: Rebalance compares the node-uuid of a file against its own to and migrates a file only if they match. However, the current behaviour in both AFR and EC is to return the node-uuid of the first brick in a replica set for all files. This means a single node ends up migrating all the files if the first brick of every replica set is on the same node. Fix: AFR and EC will return all node-uuids for the replica set. The rebalance process will divide the files to be migrated among all the nodes by hashing the gfid of the file and using that value to select a node to perform the migration. This patch makes the required DHT and tiering changes. Some tests in rebal-all-nodes-migrate.t will need to be uncommented once the AFR and EC changes are merged. Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c BUG: 1366817 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17239 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Jeff Darcy <jeff.us> Reviewed-by: Shyamsundar Ranganathan <srangana>
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#10) for review on master by Ravishankar N (ravishankar)
REVIEW: https://review.gluster.org/17084 (cluster/afr: Return the list of node_uuids for the subvolume) posted (#11) for review on master by Ravishankar N (ravishankar)
COMMIT: https://review.gluster.org/17297 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit bcc34ce05c1be76dae42838d55c15d3af5f80e48 Author: Xavier Hernandez <xhernandez> Date: Fri May 12 09:23:47 2017 +0200 cluster/ec: return all node uuids from all subvolumes EC was retuning the UUID of the brick with smaller value. This had the side effect of not evenly balancing the load between bricks on rebalance operations. This patch modifies the common functions that combine multiple subvolume values into a single result to take into account the subvolume order and, optionally, other subvolumes that could be damaged. This makes easier to add future features where brick order is important. It also makes possible to easily identify the originating brick of each answer, in case some brick will have an special meaning in the future. Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f BUG: 1366817 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: https://review.gluster.org/17297 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Ashish Pandey <aspandey> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
COMMIT: https://review.gluster.org/17084 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 0a50167c0a8f950f5a1c76442b6c9abea466200d Author: karthik-us <ksubrahm> Date: Wed Apr 19 18:04:46 2017 +0530 cluster/afr: Return the list of node_uuids for the subvolume Problem: AFR was returning the node uuid of the first node for every file if the replica set was healthy, which was resulting in only one node migrating all the files. Fix: With this patch AFR returns the list of node_uuids to the upper layer, so that they can decide on which node to migrate which files, resulting in improved performance. Ordering of node uuids will be maintained based on the ordering of the bricks. If a brick is down, then the node uuid for that will be set to all zeros. Change-Id: I73ee0f9898ae473584fdf487a2980d7a6db22f31 BUG: 1366817 Signed-off-by: karthik-us <ksubrahm> Reviewed-on: https://review.gluster.org/17084 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
REVIEW: https://review.gluster.org/17320 (tests: Uncomment multi-node rebalance tests) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/