Description of problem: Today in order to promote files from cold tier to hot tier in a dist hot-tier, we generate query file for each sub-volume and promote files in round-robin from each sub-vol. We start with sub-vol-1 and proceed through subsequent sub-vols for each cycle. consider the below case, Files eligible for promotion from each sub-vol are as below. cycle:1 ======== sub-vol1: file-1, file-2 sub-vol2: file-3, file-4 each file is of the size 5GB Now on cycle 1, file-1 is promoted and since we hit max.mb (default:4000MB) we break from cycle-1 and move to cycle-2 cycle:2 ======== sub-vol1: file-2, file-5 sub-vol2: file-3, file-4 Now again we start promotion from sub-vol1 and we promote file-2 which is again of size 5GB. If this goes on, we might never promote files to sub-vol2. We'll end up with sub-vol1 being over utilized and files won't be evenly distributed across all of the hot tier. This issue will be more evident when we have a more number of sub-vols. To avoid this issue, we must chose a random sub-vol to start the promotion and not always start from sub-vol1. Version-Release number of selected component (if applicable): glusterfs-3.7.9-1.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. create a vol 2. create 100 4GB files - file{1..100} 3. attach a dist hot tier - 10 bricks 4. start heating files Actual results: Files will only be present in subvol1 of the hot tier Expected results: Files should be evenly distributed across hot tier Additional info: No logs shall be attached
Please note that this will also be seen with demotion.
upstream patch http://review.gluster.org/14068
https://code.engineering.redhat.com/gerrit/#/c/73415/ Downstream patch
verified the fix in build - glusterfs-server-3.7.9-4.el7rhgs.x86_64 steps followed t verify: 1) Had 20 x 5 GB files in a EC volume 2) Attached a 4x1 hot tier, max.mb and max.files were left as default 3) constantly heated all 20 files 4) Files are no more promoted to only one sub-volume.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240