Bug 1298470 - [Tiering]: watermark levels are breached when huge number of files are to be promoted or demoted
Summary: [Tiering]: watermark levels are breached when huge number of files are to be ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: tier-migration
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-14 08:29 UTC by krishnaram Karthick
Modified: 2016-09-17 15:38 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.7.5-19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-28 06:22:01 UTC
Embargoed:


Attachments (Terms of Use)

Description krishnaram Karthick 2016-01-14 08:29:07 UTC
Description of problem:

On a tiered volume, with options reconfigured as mentioned in below output, watermark levels are breached during promotions and demotions. Ideally before a file is promoted or demoted, hot tier disk usage should be validated against watermark levels in order to promote/demote. This doesn't seem to work as expected as watermark levels are breached currently.

gluster vol info
 
Volume Name: regression-test
Type: Tier
Volume ID: 8d55374f-2d67-427d-98eb-49d7ace0db67
Status: Started
Number of Bricks: 20
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick1: 10.70.42.45:/rhs/brick6/leg2
Brick2: 10.70.43.141:/rhs/brick6/leg2
Brick3: 10.70.43.3:/rhs/brick6/leg2
Brick4: 10.70.42.149:/rhs/brick6/leg2
Brick5: 10.70.37.140:/rhs/brick14/leg2
Brick6: 10.70.37.77:/rhs/brick13/leg2
Brick7: 10.70.37.121:/rhs/brick14/leg2
Brick8: 10.70.37.132:/rhs/brick13/leg2
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick9: 10.70.37.132:/rhs/brick11/leg1
Brick10: 10.70.37.121:/rhs/brick11/leg1
Brick11: 10.70.37.77:/rhs/brick11/leg1
Brick12: 10.70.37.140:/rhs/brick11/leg1
Brick13: 10.70.42.149:/rhs/brick3/leg1
Brick14: 10.70.43.3:/rhs/brick3/leg1
Brick15: 10.70.43.141:/rhs/brick3/leg1
Brick16: 10.70.42.45:/rhs/brick3/leg1
Brick17: 10.70.37.132:/rhs/brick12/leg1
Brick18: 10.70.37.121:/rhs/brick12/leg1
Brick19: 10.70.37.77:/rhs/brick12/leg1
Brick20: 10.70.37.140:/rhs/brick12/leg1
Options Reconfigured:
features.barrier: disable
cluster.tier-demote-frequency: 120
cluster.tier-max-files: 100000
cluster.tier-max-mb: 100000
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.watermark-hi: 45
cluster.watermark-low: 10
cluster.read-freq-threshold: 5
features.record-counters: on
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on

Version-Release number of selected component (if applicable):
glusterfs-server-3.7.5-15.el7rhgs.x86_64

How reproducible:
Seen this behavior once, yet to try again

Steps to Reproduce:
1) On a EC (2 x (4+2) cold tier volume, created multiple files.
2) Attached a hot tier dist-rep (4 x 2) volume.
Note: Each brick of cold tier is 250G and each brick of hot tier is 50G. That would make cold tier capacity 2TB and hot tier as 200GB
3) Heated 200 files by continuous writes from cold tier, each of size 1Gb [for i in {1..200}; do echo "ee" >> file-$i; done]
4) While promotions were on-going changed watermark levels from default values to 20% and 65%
5) As hot tier used capacity reached 30%, changed demote frequency to 1800
7) While hot tier used capacity reached 50%, changed watermarks to 10% and 45% expecting demotions to happen in next cycle
8) Files kept getting promoted although watermarks were above high watermark. This behavior continued at least for 2-3 demotion cycles i.e., 1 hour and hot tier disk capacity reached above 65%
9) Stopped heating files
10) Only after crossing 73%, demotions are seen
11) Demotions continue till hot tier disk usage reaches 1% while low watermarks were set to 10%

Actual results:
watermark levels are breached

Expected results:
Watermark levels and disk usage should be validated before each file transfer

Additional info:
sosreports shall be attached

Comment 2 krishnaram Karthick 2016-01-14 08:34:42 UTC
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1298470/


Note You need to log in before you can comment on or make changes to this bug.