Bug 853258 - improve dht_fix_layout_of_directory for better rebalance
Summary: improve dht_fix_layout_of_directory for better rebalance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
: 848123 (view as bug list)
Depends On:
Blocks: 895528
TreeView+ depends on / blocked
 
Reported: 2012-08-30 21:22 UTC by Anand Avati
Modified: 2015-09-01 23:06 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:32:55 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Anand Avati 2012-08-30 21:22:11 UTC
Jeff Darcy wrote:
    > AFAICT, the fix-layout code doesn't do the same rotation that the
    > new-directory code does. Therefore, the new bricks always claim
    > completely predictable hash ranges for every directory, leading to
    > either a 0-1-2-3 pattern or a 1-0-2-3 pattern.  In other words, a
    > file whose hash falls into the second quarter of the range will always
    > be assigned to brick 2, and a file whose hash falls into the fourth
    > quarter will always be assigned to brick 3.  The rest will be split
    > according to the original pattern.  Put still another way, instead of
    > same-named files in different directories being spread across N bricks,
    > they might be spread across only two bricks (bad) or totally
    > concentrated on one brick (worse) regardless of N.
    
    The current dht_fix_layout_of_directory() code, in an attempt to
    maximize overlap of new layout with existing layout (to minimize
    movement of data) fails to do a good job of randomizing new assignment
    even when it could do a better job. In an example where we expand
    from 2 nodes to 4 nodes, the current possibilities are limited in the
    following way -
    
    (theoretical hash range: 00 - 99)
    
    OLD 1
    -----
    server1: 00 - 49
    server2: 50 - 99
    
    NEW 1
    -----
    server1: 00 - 24
    server2: 50 - 74
    server3: 25 - 49
    server4: 75 - 99
    
    OLD 2
    -----
    server1: 50 - 99
    server2: 00 - 49
    
    NEW 2
    ------
    server1: 50 - 74
    server2: 00 - 24
    server3: 25 - 49
    server4: 75 - 99
    
    The above shows that when add-brick from 2 bricks to 4 bricks, server3
    and server4 always get the _same_ hash range no matter what the original
    hash range assignment was.

Comment 1 shishir gowda 2012-09-13 07:38:56 UTC
A fix for this has been merged upstream.
commit 4f87fd0ae2ce629576ca5f647a99888d31a46815.

Comment 2 shishir gowda 2012-09-26 04:27:16 UTC
*** Bug 848123 has been marked as a duplicate of this bug. ***

Comment 3 Vijay Bellur 2013-02-07 16:27:55 UTC
CHANGE: http://review.gluster.org/3908 (dht: better layout-optimization algorithm) merged in master by Anand Avati (avati)


Note You need to log in before you can comment on or make changes to this bug.