Bug 1476852

Summary:	DHT layer must dynamically load balance rebalance activity instead of hard presetting entries for each node
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	distribute	Assignee:	Susant Kumar Palai <spalai>
Status:	CLOSED DEFERRED	QA Contact:	Prasad Desala <tdesala>
Severity:	medium	Docs Contact:
Priority:	low
Version:	rhgs-3.3	CC:	bperkins, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-01-16 07:31:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2017-07-31 15:24:20 UTC

Description of problem:
======================
With BZ#1315781 - AFR returns the node uuid of the same node for every file in the replica 
AFR returns both node UUIDs and hence dht can rebalance files from both the nodes.
However, DHT kind of hard sets the set of files to be migrated by each node.
Say I have files f{1..10000}. DHT selects such that files about 5000 files are rebalanced by n1 and remaining by n2.
Now for some reason n1 was not able to rebalance(say brick b1 went down or n1 went down as mentioned in another bZ#1476676 , then n2 must be able to take care of rebalancing files which was n1's responsibility.
Especially,once the setup is healthy again, ie n1 is up, and assuming n2 completed rebalancing its job before n1 came up, then now if we trigger reblaance, only n1 participates in reblancing and n2 doesnt do any rebalance.

hence load-balancing is lost


Version-Release number of selected component (if applicable):
=======
3.8.4-36

How reproducible:
==========
always

Steps to Reproduce:
1.create a 1x2 volume on b1 on n1 ; b2 on n2
2.create files dir{1..10}/f{1..10000}
3.add-brick b3 on n1 and b4 on n2
4. trigger rebalance
5. rebal status shows both n1 and n2 participating in rebal
6. now bring down n1,
   n2 goes ahead with rebalance of its prealloted files
7.after rebalance is compelted(by n2), now bring back n1 online , and trigger heal
8. after heal ,retrigger rebalance

Actual results:
=========
now you can see that only n1 is rebalancing files, while n2 is just a passive watcher

Expected results:
=======
now n2 must also pitch in to rebalance files , to help load balance