Bug 1303406 - [tiering]: No demotions seen in the system after a series of failure injection
[tiering]: No demotions seen in the system after a series of failure injection
Status: CLOSED DUPLICATE of bug 1294790
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
nchilaka
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-31 08:47 EST by krishnaram Karthick
Modified: 2016-09-17 11:35 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-01 00:45:05 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description krishnaram Karthick 2016-01-31 08:47:46 EST
Description of problem:
No demotions are seen on a 16 node setup after a series of failure. Please refer steps to reproduce to look at the steps followed to end up in this issue. No errors related to demotion are seen in the system. 

[root@dhcp37-101 glusterfs]# ll /var/run/gluster/krk-vol-tier-dht/
total 508
-rw-r--r--. 1 root root 364593 Jan 29 14:56 demotequeryfile-krk-vol-tier-dht.err
-rw-r--r--. 1 root root 149210 Jan 30 14:30 promotequeryfile-krk-vol-tier-dht.err

Volume Name: krk-vol
Type: Tier
Volume ID: 192655ce-4ef6-4ada-8e0c-6f137e2721e1
Status: Started
Number of Bricks: 36
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 6 x 2 = 12
Brick1: 10.70.37.101:/rhs/brick6/krkvol
Brick2: 10.70.35.163:/rhs/brick6/krkvol
Brick3: 10.70.35.173:/rhs/brick6/krkvol
Brick4: 10.70.35.232:/rhs/brick6/krkvol
Brick5: 10.70.35.176:/rhs/brick6/krkvol
Brick6: 10.70.35.231:/rhs/brick6/krkvol
Brick7: 10.70.35.44:/rhs/brick6/krkvol
Brick8: 10.70.37.195:/rhs/brick6/krkvol
Brick9: 10.70.37.202:/rhs/brick6/krkvol
Brick10: 10.70.37.120:/rhs/brick6/krkvol
Brick11: 10.70.37.60:/rhs/brick6/krkvol
Brick12: 10.70.37.69:/rhs/brick6/krkvol
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (8 + 4) = 24
Brick13: 10.70.35.176:/rhs/brick5/krkvol
Brick14: 10.70.35.232:/rhs/brick5/krkvol
Brick15: 10.70.35.173:/rhs/brick5/krkvol
Brick16: 10.70.35.163:/rhs/brick5/krkvol
Brick17: 10.70.37.101:/rhs/brick5/krkvol
Brick18: 10.70.37.69:/rhs/brick5/krkvol
Brick19: 10.70.37.60:/rhs/brick5/krkvol
Brick20: 10.70.37.120:/rhs/brick5/krkvol
Brick21: 10.70.37.202:/rhs/brick4/krkvol
Brick22: 10.70.37.195:/rhs/brick4/krkvol
Brick23: 10.70.35.155:/rhs/brick4/krkvol
Brick24: 10.70.35.222:/rhs/brick4/krkvol
Brick25: 10.70.35.108:/rhs/brick4/krkvol
Brick26: 10.70.35.44:/rhs/brick4/krkvol
Brick27: 10.70.35.89:/rhs/brick4/krkvol
Brick28: 10.70.35.231:/rhs/brick4/krkvol
Brick29: 10.70.35.176:/rhs/brick4/krkvol
Brick30: 10.70.35.232:/rhs/brick4/krkvol
Brick31: 10.70.35.173:/rhs/brick4/krkvol
Brick32: 10.70.35.163:/rhs/brick4/krkvol
Brick33: 10.70.37.101:/rhs/brick4/krkvol
Brick34: 10.70.37.69:/rhs/brick4/krkvol
Brick35: 10.70.37.60:/rhs/brick4/krkvol
Brick36: 10.70.37.120:/rhs/brick4/krkvol
Options Reconfigured:
cluster.tier-demote-frequency: 300
cluster.watermark-hi: 60
cluster.watermark-low: 50
cluster.min-free-disk: 20
performance.write-behind: off
performance.open-behind: off
performance.read-ahead: off
performance.io-cache: off
features.quota-deem-statfs: off
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
features.record-counters: on
cluster.write-freq-threshold: 1
cluster.read-freq-threshold: 1
cluster.tier-max-files: 10000
diagnostics.client-log-level: INFO
features.ctr-enabled: on
cluster.tier-mode: cache
features.uss: on

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-17.el7rhgs.x86_64

How reproducible:
Yet to determine

Steps to Reproduce:
1. On a tiered volume, promote so many files so current disk usage is near high watermark and leave the system in the same state with continuous file heating for 12+ hrs --> Started at apprx Jan 30 15:30:00 IST
2. Kill all brick process in hot tier --> induced approx at Jan 31 09:30:00 IST 2016
3. restart glusterd on all nodes hosting hot tier --> induced approx at Jan 31 17:00:00 IST 
4. restart tier volume
5. Reduce high watermark way below current disk usage level

Actual results:
No demotions are seen

Expected results:
Demotions should happen immediately after high watermark is breached

Additional info:
sosreports shall be attached
Comment 1 Mohammed Rafi KC 2016-01-31 09:31:31 EST
After volume restarted, tier daemon will also restart, which will trigger a fix-layout. Based on volume size fix-layout can take hours . Here the tier daemon still doing the fix-layout. After finishing fix-layout, promotion/demotion should start.
Comment 3 Mohammed Rafi KC 2016-02-01 00:45:05 EST

*** This bug has been marked as a duplicate of bug 1294790 ***

Note You need to log in before you can comment on or make changes to this bug.