Bug 853258

Summary: improve dht_fix_layout_of_directory for better rebalance
Product: [Community] GlusterFS Reporter: Anand Avati <aavati>
Component: distributeAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: chrisw, gluster-bugs, jdarcy, jochen_klein, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 13:32:55 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 895528    

Description Anand Avati 2012-08-30 17:22:11 EDT
Jeff Darcy wrote:
    > AFAICT, the fix-layout code doesn't do the same rotation that the
    > new-directory code does. Therefore, the new bricks always claim
    > completely predictable hash ranges for every directory, leading to
    > either a 0-1-2-3 pattern or a 1-0-2-3 pattern.  In other words, a
    > file whose hash falls into the second quarter of the range will always
    > be assigned to brick 2, and a file whose hash falls into the fourth
    > quarter will always be assigned to brick 3.  The rest will be split
    > according to the original pattern.  Put still another way, instead of
    > same-named files in different directories being spread across N bricks,
    > they might be spread across only two bricks (bad) or totally
    > concentrated on one brick (worse) regardless of N.
    
    The current dht_fix_layout_of_directory() code, in an attempt to
    maximize overlap of new layout with existing layout (to minimize
    movement of data) fails to do a good job of randomizing new assignment
    even when it could do a better job. In an example where we expand
    from 2 nodes to 4 nodes, the current possibilities are limited in the
    following way -
    
    (theoretical hash range: 00 - 99)
    
    OLD 1
    -----
    server1: 00 - 49
    server2: 50 - 99
    
    NEW 1
    -----
    server1: 00 - 24
    server2: 50 - 74
    server3: 25 - 49
    server4: 75 - 99
    
    OLD 2
    -----
    server1: 50 - 99
    server2: 00 - 49
    
    NEW 2
    ------
    server1: 50 - 74
    server2: 00 - 24
    server3: 25 - 49
    server4: 75 - 99
    
    The above shows that when add-brick from 2 bricks to 4 bricks, server3
    and server4 always get the _same_ hash range no matter what the original
    hash range assignment was.
Comment 1 shishir gowda 2012-09-13 03:38:56 EDT
A fix for this has been merged upstream.
commit 4f87fd0ae2ce629576ca5f647a99888d31a46815.
Comment 2 shishir gowda 2012-09-26 00:27:16 EDT
*** Bug 848123 has been marked as a duplicate of this bug. ***
Comment 3 Vijay Bellur 2013-02-07 11:27:55 EST
CHANGE: http://review.gluster.org/3908 (dht: better layout-optimization algorithm) merged in master by Anand Avati (avati@redhat.com)