Description of problem: When 100 files were continuously heated by repeated writes, promotions were too aggressive that it crossed high water mark. Promotions continued till almost 90% when high water mark was set at 60% Total hot tier size: 20G high water mark at 60% : 12G low water mark at 30% : 6G Promotions continued till 90%, i.e., 18G and hot tier was at 90% at least for 30 minutes. However, when the system was left with the same state with overnight IO, disk usage on hot tier reduced to 53% i.e., 11G [root@dhcp43-19 ~]# gluster vol info Volume Name: tier-vol-01 Type: Tier Volume ID: 78d1fc18-5d0a-452d-ab26-b78c15655c60 Status: Started Number of Bricks: 13 Transport-type: tcp Hot Tier : Hot Tier Type : Distribute Number of Bricks: 1 Brick1: 10.70.42.47:/rhs/brick4/leg2 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick2: 10.70.42.47:/rhs/brick1/leg1 Brick3: 10.70.43.19:/rhs/brick1/leg1 Brick4: 10.70.42.177:/rhs/brick1/leg1 Brick5: 10.70.42.10:/rhs/brick1/leg1 Brick6: 10.70.43.140:/rhs/brick1/leg1 Brick7: 10.70.42.87:/rhs/brick1/leg1 Brick8: 10.70.42.228:/rhs/brick1/leg1 Brick9: 10.70.42.183:/rhs/brick1/leg1 Brick10: 10.70.42.47:/rhs/brick2/leg1 Brick11: 10.70.43.19:/rhs/brick2/leg1 Brick12: 10.70.42.177:/rhs/brick2/leg1 Brick13: 10.70.42.10:/rhs/brick2/leg1 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on cluster.tier-promote-frequency: 120 cluster.tier-demote-frequency: 120 features.record-counters: on cluster.watermark-hi: 60 cluster.watermark-low: 10 performance.readdir-ahead: on Files that were written on mount were - /mnt/tier-vol-01/dd/file-{901..999} Version-Release number of selected component (if applicable): glusterfs-3.7.5-11.el7rhgs.x86_64 How reproducible: Yet to determine if this issue is reproducible always, Have seen this behavior at least twice Steps to Reproduce: 1. Keep the system idle with multiple files on cold tier 2. When current disk usage in hot tier is less than low water mark, keep writing to 100 files continuously 3. Monitor disk usage of hot tier Actual results: Promotions continue way above high water mark setting Expected results: promotions should stop when high water mark is hit Additional info: No new files were created during this test, existing files were written in order to mark them for promotion.
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1291969/
Hi Karthick. What is the file size you are using?
File sizes used were in the range of 100Mb for the initial test when the total hot tier was 20Gb.
Each node may independently pick a file to promote. There are 12 nodes on the cold tier, so no more than 1.2G should be moved in parallel in a single direction. This could leave to an overshoot of 1.2G (I think), but you see more than that. What each node should be doing is (1) checking free space on the entire hot tier, abort if there is not enough space to move a particular 100Mb file (2) if there is space, move the file, (3) back to (1). On code inspection, it looks like step (1) is improperly implemented. I'll see if I can recreate this and write a fix if my inspection proves to be correct.
Patch 64888 submitted for this problem.
Verified the fix on build glusterfs-3.7.5-15 Following tests were performed to verify the fix. Test 1: 1) Created a dist-rep volume (300Gb), enabled quota 2) Fuse mounted the volume and created 1000 files of 10Mb 3) Attached a replicated tier brick(50Gb) 4) Modified watermark to 45% and 70% 5) Heated all 1000 files created in step 2. No new files were written Result: Disk usage hit 70% and never increased beyond Test 2: After test1, increased watermark to 80% Result: Disk usage hit 80% and never increased beyond Test 3: After test 2, decreased watermark to 70% Result: Disk usage reached 70% and never increased beyond Test 4: stopped heating files and left the system idle. Result: Disk usage reached 45% and never decreased beyond Test 5: Repeated test 1 and test 4 with promotion cycle at 300s and demotion cycle at 1800s. Results were same. Moving the bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html