Bug 1476852 - DHT layer must dynamically load balance rebalance activity instead of hard presetting entries for each node
DHT layer must dynamically load balance rebalance activity instead of hard pr...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute (Show other bugs)
3.3
Unspecified Unspecified
low Severity unspecified
: ---
: ---
Assigned To: Nithya Balachandran
Prasad Desala
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-31 11:24 EDT by nchilaka
Modified: 2017-08-29 01:05 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2017-07-31 11:24:20 EDT
Description of problem:
======================
With BZ#1315781 - AFR returns the node uuid of the same node for every file in the replica 
AFR returns both node UUIDs and hence dht can rebalance files from both the nodes.
However, DHT kind of hard sets the set of files to be migrated by each node.
Say I have files f{1..10000}. DHT selects such that files about 5000 files are rebalanced by n1 and remaining by n2.
Now for some reason n1 was not able to rebalance(say brick b1 went down or n1 went down as mentioned in another bZ#1476676 , then n2 must be able to take care of rebalancing files which was n1's responsibility.
Especially,once the setup is healthy again, ie n1 is up, and assuming n2 completed rebalancing its job before n1 came up, then now if we trigger reblaance, only n1 participates in reblancing and n2 doesnt do any rebalance.

hence load-balancing is lost


Version-Release number of selected component (if applicable):
=======
3.8.4-36

How reproducible:
==========
always

Steps to Reproduce:
1.create a 1x2 volume on b1 on n1 ; b2 on n2
2.create files dir{1..10}/f{1..10000}
3.add-brick b3 on n1 and b4 on n2
4. trigger rebalance
5. rebal status shows both n1 and n2 participating in rebal
6. now bring down n1,
   n2 goes ahead with rebalance of its prealloted files
7.after rebalance is compelted(by n2), now bring back n1 online , and trigger heal
8. after heal ,retrigger rebalance

Actual results:
=========
now you can see that only n1 is rebalancing files, while n2 is just a passive watcher

Expected results:
=======
now n2 must also pitch in to rebalance files , to help load balance

Note You need to log in before you can comment on or make changes to this bug.