853258 – improve dht_fix_layout_of_directory for better rebalance

Bug 853258 - improve dht_fix_layout_of_directory for better rebalance

Summary: improve dht_fix_layout_of_directory for better rebalance

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	shishir gowda
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	848123 (view as bug list)
Depends On:
Blocks:	895528
TreeView+	depends on / blocked

Reported:	2012-08-30 21:22 UTC by Anand Avati
Modified:	2015-09-01 23:06 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.4.0
Clone Of:
Environment:
Last Closed:	2013-07-24 17:32:55 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Anand Avati 2012-08-30 21:22:11 UTC

Jeff Darcy wrote:
    > AFAICT, the fix-layout code doesn't do the same rotation that the
    > new-directory code does. Therefore, the new bricks always claim
    > completely predictable hash ranges for every directory, leading to
    > either a 0-1-2-3 pattern or a 1-0-2-3 pattern.  In other words, a
    > file whose hash falls into the second quarter of the range will always
    > be assigned to brick 2, and a file whose hash falls into the fourth
    > quarter will always be assigned to brick 3.  The rest will be split
    > according to the original pattern.  Put still another way, instead of
    > same-named files in different directories being spread across N bricks,
    > they might be spread across only two bricks (bad) or totally
    > concentrated on one brick (worse) regardless of N.
    
    The current dht_fix_layout_of_directory() code, in an attempt to
    maximize overlap of new layout with existing layout (to minimize
    movement of data) fails to do a good job of randomizing new assignment
    even when it could do a better job. In an example where we expand
    from 2 nodes to 4 nodes, the current possibilities are limited in the
    following way -
    
    (theoretical hash range: 00 - 99)
    
    OLD 1
    -----
    server1: 00 - 49
    server2: 50 - 99
    
    NEW 1
    -----
    server1: 00 - 24
    server2: 50 - 74
    server3: 25 - 49
    server4: 75 - 99
    
    OLD 2
    -----
    server1: 50 - 99
    server2: 00 - 49
    
    NEW 2
    ------
    server1: 50 - 74
    server2: 00 - 24
    server3: 25 - 49
    server4: 75 - 99
    
    The above shows that when add-brick from 2 bricks to 4 bricks, server3
    and server4 always get the _same_ hash range no matter what the original
    hash range assignment was.

Comment 1 shishir gowda 2012-09-13 07:38:56 UTC

A fix for this has been merged upstream.
commit 4f87fd0ae2ce629576ca5f647a99888d31a46815.

Comment 2 shishir gowda 2012-09-26 04:27:16 UTC

*** Bug 848123 has been marked as a duplicate of this bug. ***

Comment 3 Vijay Bellur 2013-02-07 16:27:55 UTC

CHANGE: http://review.gluster.org/3908 (dht: better layout-optimization algorithm) merged in master by Anand Avati (avati)

Note You need to log in before you can comment on or make changes to this bug.