Bug 1303406

Summary: [tiering]: No demotions seen in the system after a series of failure injection
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: krishnaram Karthick <kramdoss>
Component: tierAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED DUPLICATE QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: rhs-bugs, rkavunga, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-01 05:45:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description krishnaram Karthick 2016-01-31 13:47:46 UTC
Description of problem:
No demotions are seen on a 16 node setup after a series of failure. Please refer steps to reproduce to look at the steps followed to end up in this issue. No errors related to demotion are seen in the system. 

[root@dhcp37-101 glusterfs]# ll /var/run/gluster/krk-vol-tier-dht/
total 508
-rw-r--r--. 1 root root 364593 Jan 29 14:56 demotequeryfile-krk-vol-tier-dht.err
-rw-r--r--. 1 root root 149210 Jan 30 14:30 promotequeryfile-krk-vol-tier-dht.err

Volume Name: krk-vol
Type: Tier
Volume ID: 192655ce-4ef6-4ada-8e0c-6f137e2721e1
Status: Started
Number of Bricks: 36
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 6 x 2 = 12
Brick1: 10.70.37.101:/rhs/brick6/krkvol
Brick2: 10.70.35.163:/rhs/brick6/krkvol
Brick3: 10.70.35.173:/rhs/brick6/krkvol
Brick4: 10.70.35.232:/rhs/brick6/krkvol
Brick5: 10.70.35.176:/rhs/brick6/krkvol
Brick6: 10.70.35.231:/rhs/brick6/krkvol
Brick7: 10.70.35.44:/rhs/brick6/krkvol
Brick8: 10.70.37.195:/rhs/brick6/krkvol
Brick9: 10.70.37.202:/rhs/brick6/krkvol
Brick10: 10.70.37.120:/rhs/brick6/krkvol
Brick11: 10.70.37.60:/rhs/brick6/krkvol
Brick12: 10.70.37.69:/rhs/brick6/krkvol
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (8 + 4) = 24
Brick13: 10.70.35.176:/rhs/brick5/krkvol
Brick14: 10.70.35.232:/rhs/brick5/krkvol
Brick15: 10.70.35.173:/rhs/brick5/krkvol
Brick16: 10.70.35.163:/rhs/brick5/krkvol
Brick17: 10.70.37.101:/rhs/brick5/krkvol
Brick18: 10.70.37.69:/rhs/brick5/krkvol
Brick19: 10.70.37.60:/rhs/brick5/krkvol
Brick20: 10.70.37.120:/rhs/brick5/krkvol
Brick21: 10.70.37.202:/rhs/brick4/krkvol
Brick22: 10.70.37.195:/rhs/brick4/krkvol
Brick23: 10.70.35.155:/rhs/brick4/krkvol
Brick24: 10.70.35.222:/rhs/brick4/krkvol
Brick25: 10.70.35.108:/rhs/brick4/krkvol
Brick26: 10.70.35.44:/rhs/brick4/krkvol
Brick27: 10.70.35.89:/rhs/brick4/krkvol
Brick28: 10.70.35.231:/rhs/brick4/krkvol
Brick29: 10.70.35.176:/rhs/brick4/krkvol
Brick30: 10.70.35.232:/rhs/brick4/krkvol
Brick31: 10.70.35.173:/rhs/brick4/krkvol
Brick32: 10.70.35.163:/rhs/brick4/krkvol
Brick33: 10.70.37.101:/rhs/brick4/krkvol
Brick34: 10.70.37.69:/rhs/brick4/krkvol
Brick35: 10.70.37.60:/rhs/brick4/krkvol
Brick36: 10.70.37.120:/rhs/brick4/krkvol
Options Reconfigured:
cluster.tier-demote-frequency: 300
cluster.watermark-hi: 60
cluster.watermark-low: 50
cluster.min-free-disk: 20
performance.write-behind: off
performance.open-behind: off
performance.read-ahead: off
performance.io-cache: off
features.quota-deem-statfs: off
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
features.record-counters: on
cluster.write-freq-threshold: 1
cluster.read-freq-threshold: 1
cluster.tier-max-files: 10000
diagnostics.client-log-level: INFO
features.ctr-enabled: on
cluster.tier-mode: cache
features.uss: on

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-17.el7rhgs.x86_64

How reproducible:
Yet to determine

Steps to Reproduce:
1. On a tiered volume, promote so many files so current disk usage is near high watermark and leave the system in the same state with continuous file heating for 12+ hrs --> Started at apprx Jan 30 15:30:00 IST
2. Kill all brick process in hot tier --> induced approx at Jan 31 09:30:00 IST 2016
3. restart glusterd on all nodes hosting hot tier --> induced approx at Jan 31 17:00:00 IST 
4. restart tier volume
5. Reduce high watermark way below current disk usage level

Actual results:
No demotions are seen

Expected results:
Demotions should happen immediately after high watermark is breached

Additional info:
sosreports shall be attached

Comment 1 Mohammed Rafi KC 2016-01-31 14:31:31 UTC
After volume restarted, tier daemon will also restart, which will trigger a fix-layout. Based on volume size fix-layout can take hours . Here the tier daemon still doing the fix-layout. After finishing fix-layout, promotion/demotion should start.

Comment 3 Mohammed Rafi KC 2016-02-01 05:45:05 UTC

*** This bug has been marked as a duplicate of bug 1294790 ***