Description of problem: When I create image files on my glusterfs back end through cinder the image files are not evenly distributed across replicated pairs. Here is an example distribution of what I am seeing in a 6x2 distributed replicated volume: Bricks # of images 0 4 1 4 2 1 3 1 4 0 5 0 6 1 7 1 8 2 9 2 10 0 11 0 As you can see there is one set of replicated pairs with 4 volumes, one with 2, and 2 with one. I would expect the files would be distribute a bit more evenly/ I decided to start scaling thing up to see if I was just getting unlucky on the way the file names were hashed, here is the distribution with 24: Bricks # of images 0 8 1 8 2 4 3 4 4 3 5 3 6 3 7 3 8 5 9 5 10 1 11 1 And a distribution of 100 cinder volumes: Bricks # of images 0 20 1 20 2 19 3 19 4 19 5 19 6 12 7 12 8 22 9 22 10 8 11 8 In the distribution of 100 I would expect to see each ~16 images per replicated pair. In all the times I ran through this the last set of pairs seemed to have the lowest number of files. Version-Release number of selected component (if applicable): glusterfs-3.4.0.15rhs-1.el6rhs.x86_64 How reproducible: Each time I have tested this, but the distribution never ends up exactly the same. The only commonality I have seen is that the last set of replicated pairs seem to get the least number of files. Steps to Reproduce: I did this by running "cinder create --display-name=test$i 1" 100 times. I think that this could be done just as easily outside openstack with: 1. for i in `seq 1 100`; do touch `echo volume-$(uuidgen)`; done 2. run "ls volume* | wc -l " on each brick Actual results: An uneven distribution of files. Expected results: A more even distribution of files, especially on the last replicated pair. Additional info: I can see this being problematic if we have too many VMs/volumes stacked on the same replicated pair. The pair that has 22 volumes with have significantly more IOPS than the pair with 8.
I ran this with 10,000 files using for i in `seq 1 10000`; do touch `echo volume-$(uuidgen)`; done and the distribution looked much better: Bricks # of images 0 1633 1 1633 2 1706 3 1706 4 1643 5 1643 6 1687 7 1687 8 1672 9 1672 10 1659 11 1659 Could I just be having an unfortunate string of bad luck with the smaller numbers of files? Is there any way of of ensuring an even distribution with a smaller numbers of files?
Changing component to glusterfs and re-assigning to amarts for review.
Raghavendra, Can you pls take a look and close this if this is not a issue.
DHT translator (responsible for file Distribution) distribute hash ranges to each brick. In simple words, if you have 3 bricks , range is brick1 :- 00000000 to 55555554 brick2 - 55555555 to aaaaaaa9 brick3:- aaaaaaaa to FFFFFFFF - DHT calculates Hash value of file using 'File name' and that value falls between 00000000 to FFFFFFFF. According to hash value it picks up brick where file should go. So even distribution is not possible always. We might be thinking that we are giving random file name but it depends on hash value. --> if we use round-robin mechanism or any algorithm which first checks no. of files per brick, compare those no. and then choose brick to store file we might end up with even data distribution but even distribution is not always possible with current implementation. Hope this helps.
Per the current implementation of DHT, this is how it works. Raghavendra, Can we close it as NAB ? Does it make sense to capture this as a RFE for any future DHT enhancements ?
Hash based distributions tend to appear non-random for small numbers. The observations in this bug are also in-line with this (distribution is more uniform for large number of files). Hence closing this bug. Also, as far as uniformity of distribution goes, our hash function is name-agnostic. The names we choose shouldn't really affect the uniformity of distribution.