Red Hat Bugzilla – Bug 763990
enhance gluster volume rebalance
Last modified: 2015-12-01 11:45:32 EST
Need to achieve two things:
1. Separate 'step 1' (fixing layout), and 'step 2' (data migration).
2. Make sure data migration is done parallally from all the involved servers, instead of current implementation where only one node is getting busy.
currently if a rebalance command is stopped after step 1 (fixing layout) is complete, it will start from step 1 again. Ideally we should have feature to handle each step individually.
See also bug763844, which contains suggestions for how to address point 2.
(In reply to comment #2)
> See also bug763844, which contains suggestions for how to address point 2.
Thanks for the suggestion. Yes, we are aware of the current issue in rebalance that all the data which needs to be migrated will pass through the node which issued the command, which surely is not advised. We will be working on it separately. First we will enable CLI to issue commands to both steps separately, then will take up internal enhancement to rebalance in glusterd.
PS: We will be using the 'trusted.distribute.linkinfo' or 'trusted.glusterfs.pathinfo' keys in getxattr to know about link file's location.
*** Bug 2384 has been marked as a duplicate of this bug. ***
PATCH: http://patches.gluster.com/patch/6298 in master (gluster rebalance: give option to split the command)
(In reply to comment #1)
> Need to achieve two things:
> 1. Separate 'step 1' (fixing layout), and 'step 2' (data migration).
fixed with the patch http://patches.gluster.com/patch/6298, this is the one planned for 3.1.3, hence changing the target milestone of the bug.
Documentation to be updated with new options (available in above link). We have to add tests to check the new options of rebalance.
> 2. Make sure data migration is done parallally from all the involved servers,
> instead of current implementation where only one node is getting busy.
This will be bought in 3.2.x releases, with more enhancement to glusterd, through which it will not even require a fuse mount, and also planing to bring a delay option in rebalance, so the load won't spike when people do rebalance of a large volume.
Added the following information in "Bug 2258: Fixed the layout issue occurred during rebalance volume." in Issues Resolved... section and "Rebalancing Volume command is enhanced with two new options: gluster volume rebalance <VOLNAME> fix-layout start and gluster volume rebalance <VOLNAME> migrate-data start ..." in What is New... section of the 3.1.3 Release Notes.
PATCH: http://patches.gluster.com/patch/6722 in master (gluster rebalance: don't move a hardlinked file.)
PATCH: http://patches.gluster.com/patch/7257 in master (statfs(): honor the 'inode' on which the statfs() call is made)
PATCH: http://patches.gluster.com/patch/7320 in master (gluster rebalance: prevent data migration from higher disk space to lower)
PATCH: http://patches.gluster.com/patch/7354 in master (gluster rebalance: fix the mount command string)
PATCH: http://patches.gluster.com/patch/7523 in master (gluster rebalance: bring in a 'force' option)
PATCH: http://patches.gluster.com/patch/7520 in master (gluster rebalance: handle the migration of files with 'holes'.)
PATCH: http://patches.gluster.com/patch/7537 in release-3.1 (glusterd-volgen: fix rdma volume file path in case of 'tcp, rdma' transport.)
PATCH: http://patches.gluster.com/patch/7538 in release-3.1 (rpc-transport/rdma: don't return '0' in case of un-initiated rdma_connect())
PATCH: http://patches.gluster.com/patch/7539 in release-3.1 (nfs:command to change the transport type of nfs server for volumes of transport tcp, rdma)
PATCH: http://patches.gluster.com/patch/7540 in release-3.1 (fix multiple transport related portmap issues in client handshake)
(In reply to comment #17)
> PATCH: http://patches.gluster.com/patch/7540 in release-3.1 (fix multiple
> transport related portmap issues in client handshake)
Previous 4 commits are sent with wrong BUG ID, my mistake. They are supposed to be for bug 763989. Please ignore.
Btw, Rebalance is very critical part of maintaining a glusterfs volume (mainly when scale-out and scale-down operations are involved), hence all the enhancement required to rebalance should be considered critical and we plan to work on it for 3.3.0
PATCH: http://patches.gluster.com/patch/7709 in master (cluster/distribute: handle layout overlaps while giving a new fix)
CHANGE: http://review.gluster.com/15 (in case of layout 'creation', layout->err == ENOSPC should be ignored) merged in master by Anand Avati (email@example.com)
This was open as during 3.2.x time, we found lot of issues with 'rebalance' command and wanted to keep one bug for all the commits. But now we have separate bug for each of the individual issues/bugs. So this generic bug doesn't make sense anymore.
Main improvement in rebalance which is pending is bug 764803, for which a patch is already submitted. All other pending things are mostly minor.