Bug 1476852 - DHT layer must dynamically load balance rebalance activity instead of hard presetting entries for each node
Summary: DHT layer must dynamically load balance rebalance activity instead of hard pr...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Susant Kumar Palai
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-31 15:24 UTC by Nag Pavan Chilakam
Modified: 2020-01-16 07:31 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-16 07:31:06 UTC
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2017-07-31 15:24:20 UTC
Description of problem:
======================
With BZ#1315781 - AFR returns the node uuid of the same node for every file in the replica 
AFR returns both node UUIDs and hence dht can rebalance files from both the nodes.
However, DHT kind of hard sets the set of files to be migrated by each node.
Say I have files f{1..10000}. DHT selects such that files about 5000 files are rebalanced by n1 and remaining by n2.
Now for some reason n1 was not able to rebalance(say brick b1 went down or n1 went down as mentioned in another bZ#1476676 , then n2 must be able to take care of rebalancing files which was n1's responsibility.
Especially,once the setup is healthy again, ie n1 is up, and assuming n2 completed rebalancing its job before n1 came up, then now if we trigger reblaance, only n1 participates in reblancing and n2 doesnt do any rebalance.

hence load-balancing is lost


Version-Release number of selected component (if applicable):
=======
3.8.4-36

How reproducible:
==========
always

Steps to Reproduce:
1.create a 1x2 volume on b1 on n1 ; b2 on n2
2.create files dir{1..10}/f{1..10000}
3.add-brick b3 on n1 and b4 on n2
4. trigger rebalance
5. rebal status shows both n1 and n2 participating in rebal
6. now bring down n1,
   n2 goes ahead with rebalance of its prealloted files
7.after rebalance is compelted(by n2), now bring back n1 online , and trigger heal
8. after heal ,retrigger rebalance

Actual results:
=========
now you can see that only n1 is rebalancing files, while n2 is just a passive watcher

Expected results:
=======
now n2 must also pitch in to rebalance files , to help load balance


Note You need to log in before you can comment on or make changes to this bug.