Description of problem: On a tiered volume, with options reconfigured as mentioned in below output, watermark levels are breached during promotions and demotions. Ideally before a file is promoted or demoted, hot tier disk usage should be validated against watermark levels in order to promote/demote. This doesn't seem to work as expected as watermark levels are breached currently. gluster vol info Volume Name: regression-test Type: Tier Volume ID: 8d55374f-2d67-427d-98eb-49d7ace0db67 Status: Started Number of Bricks: 20 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 4 x 2 = 8 Brick1: 10.70.42.45:/rhs/brick6/leg2 Brick2: 10.70.43.141:/rhs/brick6/leg2 Brick3: 10.70.43.3:/rhs/brick6/leg2 Brick4: 10.70.42.149:/rhs/brick6/leg2 Brick5: 10.70.37.140:/rhs/brick14/leg2 Brick6: 10.70.37.77:/rhs/brick13/leg2 Brick7: 10.70.37.121:/rhs/brick14/leg2 Brick8: 10.70.37.132:/rhs/brick13/leg2 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick9: 10.70.37.132:/rhs/brick11/leg1 Brick10: 10.70.37.121:/rhs/brick11/leg1 Brick11: 10.70.37.77:/rhs/brick11/leg1 Brick12: 10.70.37.140:/rhs/brick11/leg1 Brick13: 10.70.42.149:/rhs/brick3/leg1 Brick14: 10.70.43.3:/rhs/brick3/leg1 Brick15: 10.70.43.141:/rhs/brick3/leg1 Brick16: 10.70.42.45:/rhs/brick3/leg1 Brick17: 10.70.37.132:/rhs/brick12/leg1 Brick18: 10.70.37.121:/rhs/brick12/leg1 Brick19: 10.70.37.77:/rhs/brick12/leg1 Brick20: 10.70.37.140:/rhs/brick12/leg1 Options Reconfigured: features.barrier: disable cluster.tier-demote-frequency: 120 cluster.tier-max-files: 100000 cluster.tier-max-mb: 100000 features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.watermark-hi: 45 cluster.watermark-low: 10 cluster.read-freq-threshold: 5 features.record-counters: on cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on Version-Release number of selected component (if applicable): glusterfs-server-3.7.5-15.el7rhgs.x86_64 How reproducible: Seen this behavior once, yet to try again Steps to Reproduce: 1) On a EC (2 x (4+2) cold tier volume, created multiple files. 2) Attached a hot tier dist-rep (4 x 2) volume. Note: Each brick of cold tier is 250G and each brick of hot tier is 50G. That would make cold tier capacity 2TB and hot tier as 200GB 3) Heated 200 files by continuous writes from cold tier, each of size 1Gb [for i in {1..200}; do echo "ee" >> file-$i; done] 4) While promotions were on-going changed watermark levels from default values to 20% and 65% 5) As hot tier used capacity reached 30%, changed demote frequency to 1800 7) While hot tier used capacity reached 50%, changed watermarks to 10% and 45% expecting demotions to happen in next cycle 8) Files kept getting promoted although watermarks were above high watermark. This behavior continued at least for 2-3 demotion cycles i.e., 1 hour and hot tier disk capacity reached above 65% 9) Stopped heating files 10) Only after crossing 73%, demotions are seen 11) Demotions continue till hot tier disk usage reaches 1% while low watermarks were set to 10% Actual results: watermark levels are breached Expected results: Watermark levels and disk usage should be validated before each file transfer Additional info: sosreports shall be attached
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1298470/