Bug 993125 - [RHOS-RHS] Uneven distribution of image files on glusterfs back end.
Summary: [RHOS-RHS] Uneven distribution of image files on glusterfs back end.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Raghavendra G
QA Contact: Ben Turner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-05 16:28 UTC by Ben Turner
Modified: 2014-04-03 04:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
virt rhos cinder rhs integration
Last Closed: 2014-04-03 04:20:54 UTC
Embargoed:


Attachments (Terms of Use)

Description Ben Turner 2013-08-05 16:28:07 UTC
Description of problem:

When I create image files on my glusterfs back end through cinder the image files are not evenly distributed across replicated pairs.  Here is an example distribution of what I am seeing in a 6x2 distributed replicated volume:

Bricks      # of images
0           4
1           4

2           1
3           1

4           0
5           0

6           1
7           1

8           2
9           2

10          0
11          0 

As you can see there is one set of replicated pairs with 4 volumes, one with 2, and 2 with one.  I would expect the files would be distribute a bit more evenly/

I decided to start scaling thing up to see if I was just getting unlucky on the way the file names were hashed, here is the distribution with 24:

Bricks      # of images
0           8
1           8

2           4
3           4

4           3
5           3

6           3
7           3

8           5
9           5

10          1
11          1

And a distribution of 100 cinder volumes:

Bricks      # of images
0           20
1           20

2           19
3           19

4           19
5           19

6           12
7           12

8           22
9           22

10          8
11          8

In the distribution of 100 I would expect to see each ~16 images per replicated pair.  In all the times I ran through this the last set of pairs seemed to have the lowest number of files.

Version-Release number of selected component (if applicable):

glusterfs-3.4.0.15rhs-1.el6rhs.x86_64

How reproducible:

Each time I have tested this, but the distribution never ends up exactly the same.  The only commonality I have seen is that the last set of replicated pairs seem to get the least number of files.

Steps to Reproduce:
I did this by running "cinder create --display-name=test$i 1" 100 times.  I think that this could be done just as easily outside openstack with:

1.  for i in `seq 1 100`; do touch `echo volume-$(uuidgen)`; done
2.  run "ls volume* | wc -l " on each brick

Actual results:

An uneven distribution of files.

Expected results:

A more even distribution of files, especially on the last replicated pair.

Additional info:

I can see this being problematic if we have too many VMs/volumes stacked on the same replicated pair.  The pair that has 22 volumes with have significantly more  IOPS than the pair with 8.

Comment 2 Ben Turner 2013-08-05 18:45:19 UTC
I ran this with 10,000 files using for i in `seq 1 10000`; do touch `echo volume-$(uuidgen)`; done and the distribution looked much better:

Bricks      # of images
0           1633
1           1633

2           1706
3           1706

4           1643
5           1643

6           1687
7           1687

8           1672
9           1672

10          1659
11          1659

Could I just be having an unfortunate string of bad luck with the smaller numbers of files?  Is there any way of of ensuring an even distribution with a smaller numbers of files?

Comment 3 Scott Haines 2013-08-06 02:14:18 UTC
Changing component to glusterfs and re-assigning to amarts for review.

Comment 4 Deepak C Shetty 2014-03-28 11:15:29 UTC
Raghavendra,
   Can you pls take a look and close this if this is not a issue.

Comment 5 Rachana Patel 2014-04-02 13:05:58 UTC
DHT translator (responsible for file Distribution) distribute hash ranges to each brick.
In simple words, if you have 3 bricks , range is
brick1 :- 00000000 to 55555554
brick2 -  55555555 to aaaaaaa9
brick3:-  aaaaaaaa to FFFFFFFF

- DHT calculates Hash value of file using 'File name' and that value falls between 00000000 to FFFFFFFF. According to hash value it picks up brick where file should go.

So even distribution is not possible always. We might be thinking that we are giving random file name but it depends on hash value.

--> if we use round-robin mechanism or any algorithm which first checks no. of files per brick, compare those no. and then choose brick to store file we might end up with even data distribution but even distribution is not always possible with current implementation.

Hope this helps.

Comment 6 Deepak C Shetty 2014-04-02 13:16:56 UTC
Per the current implementation of DHT, this is how it works.

Raghavendra,
     Can we close it as NAB ?
Does it make sense to capture this as a RFE for any future DHT enhancements ?

Comment 7 Raghavendra G 2014-04-03 04:20:35 UTC
Hash based distributions tend to appear non-random for small numbers. The observations in this bug are also in-line with this (distribution is more uniform for large number of files). Hence closing this bug.

Also, as far as uniformity of distribution goes, our hash function is name-agnostic. The names we choose shouldn't really affect the uniformity of distribution.


Note You need to log in before you can comment on or make changes to this bug.