Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 853258

Summary:	improve dht_fix_layout_of_directory for better rebalance
Product:	[Community] GlusterFS	Reporter:	Anand Avati <aavati>
Component:	distribute	Assignee:	shishir gowda <sgowda>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	chrisw, gluster-bugs, jdarcy, jochen_klein, nsathyan
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.4.0	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-07-24 17:32:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	895528

Description Anand Avati 2012-08-30 21:22:11 UTC

Jeff Darcy wrote:
    > AFAICT, the fix-layout code doesn't do the same rotation that the
    > new-directory code does. Therefore, the new bricks always claim
    > completely predictable hash ranges for every directory, leading to
    > either a 0-1-2-3 pattern or a 1-0-2-3 pattern.  In other words, a
    > file whose hash falls into the second quarter of the range will always
    > be assigned to brick 2, and a file whose hash falls into the fourth
    > quarter will always be assigned to brick 3.  The rest will be split
    > according to the original pattern.  Put still another way, instead of
    > same-named files in different directories being spread across N bricks,
    > they might be spread across only two bricks (bad) or totally
    > concentrated on one brick (worse) regardless of N.
    
    The current dht_fix_layout_of_directory() code, in an attempt to
    maximize overlap of new layout with existing layout (to minimize
    movement of data) fails to do a good job of randomizing new assignment
    even when it could do a better job. In an example where we expand
    from 2 nodes to 4 nodes, the current possibilities are limited in the
    following way -
    
    (theoretical hash range: 00 - 99)
    
    OLD 1
    -----
    server1: 00 - 49
    server2: 50 - 99
    
    NEW 1
    -----
    server1: 00 - 24
    server2: 50 - 74
    server3: 25 - 49
    server4: 75 - 99
    
    OLD 2
    -----
    server1: 50 - 99
    server2: 00 - 49
    
    NEW 2
    ------
    server1: 50 - 74
    server2: 00 - 24
    server3: 25 - 49
    server4: 75 - 99
    
    The above shows that when add-brick from 2 bricks to 4 bricks, server3
    and server4 always get the _same_ hash range no matter what the original
    hash range assignment was.

Comment 1 shishir gowda 2012-09-13 07:38:56 UTC

A fix for this has been merged upstream.
commit 4f87fd0ae2ce629576ca5f647a99888d31a46815.

Comment 2 shishir gowda 2012-09-26 04:27:16 UTC

*** Bug 848123 has been marked as a duplicate of this bug. ***

Comment 3 Vijay Bellur 2013-02-07 16:27:55 UTC

CHANGE: http://review.gluster.org/3908 (dht: better layout-optimization algorithm) merged in master by Anand Avati (avati)