Bug 853258 - improve dht_fix_layout_of_directory for better rebalance
improve dht_fix_layout_of_directory for better rebalance
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: shishir gowda
:
: 848123 (view as bug list)
Depends On:
Blocks: 895528
  Show dependency treegraph
 
Reported: 2012-08-30 17:22 EDT by Anand Avati
Modified: 2015-09-01 19:06 EDT (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:32:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Anand Avati 2012-08-30 17:22:11 EDT
Jeff Darcy wrote:
    > AFAICT, the fix-layout code doesn't do the same rotation that the
    > new-directory code does. Therefore, the new bricks always claim
    > completely predictable hash ranges for every directory, leading to
    > either a 0-1-2-3 pattern or a 1-0-2-3 pattern.  In other words, a
    > file whose hash falls into the second quarter of the range will always
    > be assigned to brick 2, and a file whose hash falls into the fourth
    > quarter will always be assigned to brick 3.  The rest will be split
    > according to the original pattern.  Put still another way, instead of
    > same-named files in different directories being spread across N bricks,
    > they might be spread across only two bricks (bad) or totally
    > concentrated on one brick (worse) regardless of N.
    
    The current dht_fix_layout_of_directory() code, in an attempt to
    maximize overlap of new layout with existing layout (to minimize
    movement of data) fails to do a good job of randomizing new assignment
    even when it could do a better job. In an example where we expand
    from 2 nodes to 4 nodes, the current possibilities are limited in the
    following way -
    
    (theoretical hash range: 00 - 99)
    
    OLD 1
    -----
    server1: 00 - 49
    server2: 50 - 99
    
    NEW 1
    -----
    server1: 00 - 24
    server2: 50 - 74
    server3: 25 - 49
    server4: 75 - 99
    
    OLD 2
    -----
    server1: 50 - 99
    server2: 00 - 49
    
    NEW 2
    ------
    server1: 50 - 74
    server2: 00 - 24
    server3: 25 - 49
    server4: 75 - 99
    
    The above shows that when add-brick from 2 bricks to 4 bricks, server3
    and server4 always get the _same_ hash range no matter what the original
    hash range assignment was.
Comment 1 shishir gowda 2012-09-13 03:38:56 EDT
A fix for this has been merged upstream.
commit 4f87fd0ae2ce629576ca5f647a99888d31a46815.
Comment 2 shishir gowda 2012-09-26 00:27:16 EDT
*** Bug 848123 has been marked as a duplicate of this bug. ***
Comment 3 Vijay Bellur 2013-02-07 11:27:55 EST
CHANGE: http://review.gluster.org/3908 (dht: better layout-optimization algorithm) merged in master by Anand Avati (avati@redhat.com)

Note You need to log in before you can comment on or make changes to this bug.