Bug 763990 (GLUSTER-2258)

Summary: enhance gluster volume rebalance
Product: [Community] GlusterFS Reporter: Amar Tumballi <amarts>
Component: cliAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: mainlineCC: divya, gluster-bugs, jdarcy, saurabh, vijay, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: DA CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 764801, 764802, 764803, 764807, 764813, 764977    
Bug Blocks: 763844    

Description Amar Tumballi 2010-12-29 09:34:15 UTC
Need to achieve two things:

1. Separate 'step 1' (fixing layout), and 'step 2' (data migration).

2. Make sure data migration is done parallally from all the involved servers, instead of current implementation where only one node is getting busy.

Comment 1 Amar Tumballi 2010-12-29 12:08:41 UTC
currently if a rebalance command is stopped after step 1 (fixing layout) is complete, it will start from step 1 again. Ideally we should have feature to handle each step individually.

Comment 2 Jeff Darcy 2011-01-03 15:22:31 UTC
See also bug763844, which contains suggestions for how to address point 2.

Comment 3 Amar Tumballi 2011-01-04 01:43:55 UTC
(In reply to comment #2)
> See also bug763844, which contains suggestions for how to address point 2.

Hi Jeff,

Thanks for the suggestion. Yes, we are aware of the current issue in rebalance that all the data which needs to be migrated will pass through the node which issued the command, which surely is not advised. We will be working on it separately. First we will enable CLI to issue commands to both steps separately, then will take up internal enhancement to rebalance in glusterd. 

PS: We will be using the 'trusted.distribute.linkinfo' or 'trusted.glusterfs.pathinfo' keys in getxattr to know about link file's location.

Comment 4 Amar Tumballi 2011-02-23 09:31:46 UTC
*** Bug 2384 has been marked as a duplicate of this bug. ***

Comment 5 Anand Avati 2011-03-01 20:10:44 UTC
PATCH: http://patches.gluster.com/patch/6298 in master (gluster rebalance: give option to split the command)

Comment 6 Amar Tumballi 2011-03-02 02:45:25 UTC
(In reply to comment #1)
> Need to achieve two things:
> 
> 1. Separate 'step 1' (fixing layout), and 'step 2' (data migration).

fixed with the patch http://patches.gluster.com/patch/6298, this is the one planned for 3.1.3, hence changing the target milestone of the bug.

Documentation to be updated with new options (available in above link). We have to add tests to check the new options of rebalance.


> 
> 2. Make sure data migration is done parallally from all the involved servers,
> instead of current implementation where only one node is getting busy.

This will be bought in 3.2.x releases, with more enhancement to glusterd, through which it will not even require a fuse mount, and also planing to bring a delay option in rebalance, so the load won't spike when people do rebalance of a large volume.

Comment 7 Divya 2011-03-21 04:23:56 UTC
Added the following information in "Bug 2258: Fixed the layout issue occurred during rebalance volume." in Issues Resolved... section and "Rebalancing Volume command is enhanced with two new options: gluster volume rebalance <VOLNAME> fix-layout start and gluster volume rebalance <VOLNAME> migrate-data start ..." in What is New... section of the 3.1.3 Release Notes.

Comment 8 Vijay Bellur 2011-04-08 05:19:04 UTC
PATCH: http://patches.gluster.com/patch/6722 in master (gluster rebalance: don't move a hardlinked file.)

Comment 9 Anand Avati 2011-05-31 09:11:06 UTC
PATCH: http://patches.gluster.com/patch/7257 in master (statfs(): honor the 'inode' on which the statfs() call is made)

Comment 10 Anand Avati 2011-06-01 02:52:07 UTC
PATCH: http://patches.gluster.com/patch/7320 in master (gluster rebalance: prevent data migration from higher disk space to lower)

Comment 11 Anand Avati 2011-06-14 06:38:11 UTC
PATCH: http://patches.gluster.com/patch/7354 in master (gluster rebalance: fix the mount command string)

Comment 12 Anand Avati 2011-06-17 02:02:05 UTC
PATCH: http://patches.gluster.com/patch/7523 in master (gluster rebalance: bring in a 'force' option)

Comment 13 Anand Avati 2011-06-17 02:02:11 UTC
PATCH: http://patches.gluster.com/patch/7520 in master (gluster rebalance: handle the migration of files with 'holes'.)

Comment 14 Anand Avati 2011-06-19 06:16:22 UTC
PATCH: http://patches.gluster.com/patch/7537 in release-3.1 (glusterd-volgen: fix rdma volume file path in case of 'tcp, rdma' transport.)

Comment 15 Anand Avati 2011-06-19 06:16:27 UTC
PATCH: http://patches.gluster.com/patch/7538 in release-3.1 (rpc-transport/rdma: don't return '0' in case of un-initiated rdma_connect())

Comment 16 Anand Avati 2011-06-19 06:16:33 UTC
PATCH: http://patches.gluster.com/patch/7539 in release-3.1 (nfs:command to change the transport type of nfs server for volumes of transport tcp, rdma)

Comment 17 Anand Avati 2011-06-19 06:16:38 UTC
PATCH: http://patches.gluster.com/patch/7540 in release-3.1 (fix multiple transport related portmap issues in client handshake)

Comment 18 Amar Tumballi 2011-06-22 03:56:13 UTC
(In reply to comment #17)
> PATCH: http://patches.gluster.com/patch/7540 in release-3.1 (fix multiple
> transport related portmap issues in client handshake)

Previous 4 commits are sent with wrong BUG ID, my mistake. They are supposed to be for bug 763989. Please ignore.

Btw, Rebalance is very critical part of maintaining a glusterfs volume (mainly when scale-out and scale-down operations are involved), hence all the enhancement required to rebalance should be considered critical and we plan to work on it for 3.3.0

Comment 19 Anand Avati 2011-07-14 05:01:38 UTC
PATCH: http://patches.gluster.com/patch/7709 in master (cluster/distribute: handle layout overlaps while giving a new fix)

Comment 20 Anand Avati 2011-07-27 03:48:19 UTC
CHANGE: http://review.gluster.com/15 (in case of layout 'creation', layout->err == ENOSPC should be ignored) merged in master by Anand Avati (avati)

Comment 21 Amar Tumballi 2011-09-09 10:00:47 UTC
This was open as during 3.2.x time, we found lot of issues with 'rebalance' command and wanted to keep one bug for all the commits. But now we have separate bug for each of the individual issues/bugs. So this generic bug doesn't make sense anymore.

Main improvement in rebalance which is pending is bug 764803, for which a patch is already submitted. All other pending things are mostly minor.