Bug 1303894

Summary: promotions not happening when space is created on previously full hot tier
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Manoj Pillai <mpillai>
Component: tierAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: krishnaram Karthick <kramdoss>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: byarlaga, kramdoss, nbalacha, nchilaka, rcyriac, rhs-bugs, sankarshan, smohan, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2Flags: kramdoss: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.5-19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1303895 (view as bug list) Environment:
Last Closed: 2016-03-01 06:08:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1303895, 1306129    

Description Manoj Pillai 2016-02-02 11:08:57 UTC
Description of problem:
Tests where space is created on a previously full hot-tier are showing erratic behaviour on promotions. In some runs, promotions are happening as expected. In most cases, no promotions are happening at all.

Version-Release number of selected component (if applicable):
glusterfs*-3.7.5-17.el7.x86_64
kernel: 3.10.0-327.el7.x86_64 (RHEL 7.2)

How reproducible:
Consistently, with the steps below.

Steps to Reproduce:

1. 
create 2x(8+4) base volume (about 15TB capacity); attach 2x2 SAS-SSD as hot tier (about 360GB capacity). fuse mount on a set of clients.

2. 
create directory smf_init in the mount point. create a data set of size 480GB within directory smf-init, of large files each 256MB in size. This fills up the hot tier to the max allowed.

3.
create a directory smf_data in the mount point. create a data set of size 32GB within smf_data, of small files each 64KB. rm -rf <mnt-pt>/smf_init. this deletes all files created in step 1 and creates space within the hot tier.

4. read files in the directory <mnt-pt>/smf_data. time taken for read phase is more than the promote frequency of 120s.

Actual results:
Files read in step 4. are not getting promoted to hot-tier.

Expected results:
Files should get promoted.

Additional info:

Comment 8 Nag Pavan Chilakam 2016-02-12 12:51:57 UTC
I have verified the bug on 3.7.5-19 and it is working:
[root@inception glusterfs]# rpm -qa|grep gluster
nfs-ganesha-gluster-2.2.0-12.el7rhgs.x86_64
glusterfs-fuse-3.7.5-19.el7rhgs.x86_64
glusterfs-libs-3.7.5-19.el7rhgs.x86_64
glusterfs-rdma-3.7.5-19.el7rhgs.x86_64
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
glusterfs-cli-3.7.5-19.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-15.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-19.el7rhgs.x86_64
glusterfs-server-3.7.5-19.el7rhgs.x86_64
glusterfs-geo-replication-3.7.5-19.el7rhgs.x86_64
glusterfs-api-3.7.5-19.el7rhgs.x86_64
gluster-nagios-common-0.2.2-1.el7rhgs.noarch
glusterfs-3.7.5-19.el7rhgs.x86_64
python-gluster-3.7.1-16.el7rhgs.x86_64



Steps for validation:
1)created a ecvol 4+2 on a  4 node setup with each disk of 15GB(hence 60GB effective)
2)created some 10 750MB files in volume 
3)attached tier of 2x2 with each brick of 15GB (so hot tier is 30GB)and enabled uss and quota
4)Now created data such that 90-95% of hot tier is full and new creates go to cold tier
5)Now tried to heat a cold legacy file which didnt get promoted due to lack of space(database was capturing heat as expected and can see the size of binary file in /var/run/gluster increase)
6)Now removed some hot tier files to free up space and reheated cold files
7)cold files got promoted

tried this scenario twice which worked successfully. hEnce moving to verified



[root@inception ~]# gluster v info
 
Volume Name: nagbug
Type: Tier
Volume ID: 02612ca6-59b0-4b93-8def-c181e509d6bc
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.36.4:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick2: 10.70.36.3:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick3: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick4: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick6: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick7: 10.70.36.3:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick8: 10.70.36.4:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick9: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.2/nagbug
Brick10: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.2/nagbug
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
features.uss: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
cluster.enable-shared-storage: disable
[root@inception ~]# gluster v status nagbug
Status of volume: nagbug
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.36.4:/bricks/brick6/smallbricks
/smallbrickmount.5/nagbug_hot               49167     0          Y       5812 
Brick 10.70.36.3:/bricks/brick6/smallbricks
/smallbrickmount.5/nagbug_hot               49163     0          Y       17778
Brick 10.70.36.2:/bricks/brick6/smallbricks
/smallbrickmount.5/nagbug_hot               49170     0          Y       6501 
Brick 10.70.34.50:/bricks/brick6/smallbrick
s/smallbrickmount.5/nagbug_hot              49166     0          Y       21702
Cold Bricks:
Brick 10.70.34.50:/bricks/brick6/smallbrick
s/smallbrickmount.1/nagbug                  49163     0          Y       15345
Brick 10.70.36.2:/bricks/brick6/smallbricks
/smallbrickmount.1/nagbug                   49167     0          Y       32444
Brick 10.70.36.3:/bricks/brick6/smallbricks
/smallbrickmount.1/nagbug                   49161     0          Y       11446
Brick 10.70.36.4:/bricks/brick6/smallbricks
/smallbrickmount.1/nagbug                   49165     0          Y       31775
Brick 10.70.34.50:/bricks/brick6/smallbrick
s/smallbrickmount.2/nagbug                  49164     0          Y       15364
Brick 10.70.36.2:/bricks/brick6/smallbricks
/smallbrickmount.2/nagbug                   49168     0          Y       32463
Snapshot Daemon on localhost                49165     0          Y       15533
NFS Server on localhost                     2049      0          Y       21723
Self-heal Daemon on localhost               N/A       N/A        Y       21731
Quota Daemon on localhost                   N/A       N/A        Y       21739
Snapshot Daemon on rhs-arch-srv2.lab.eng.bl
r.redhat.com                                49169     0          Y       32568
NFS Server on rhs-arch-srv2.lab.eng.blr.red
hat.com                                     2049      0          Y       6522 
Self-heal Daemon on rhs-arch-srv2.lab.eng.b
lr.redhat.com                               N/A       N/A        Y       6530 
Quota Daemon on rhs-arch-srv2.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       6538 
Snapshot Daemon on rhs-arch-srv3.lab.eng.bl
r.redhat.com                                49162     0          Y       11559
NFS Server on rhs-arch-srv3.lab.eng.blr.red
hat.com                                     2049      0          Y       17798
Self-heal Daemon on rhs-arch-srv3.lab.eng.b
lr.redhat.com                               N/A       N/A        Y       17806
Quota Daemon on rhs-arch-srv3.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       17814
Snapshot Daemon on rhs-arch-srv4.lab.eng.bl
r.redhat.com                                49166     0          Y       31883
NFS Server on rhs-arch-srv4.lab.eng.blr.red
hat.com                                     2049      0          Y       5832 
Self-heal Daemon on rhs-arch-srv4.lab.eng.b
lr.redhat.com                               N/A       N/A        Y       5840 
Quota Daemon on rhs-arch-srv4.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       5848 
 
Task Status of Volume nagbug
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : 0b7f3da4-46ee-4e2f-acf4-f8fad31b3cb6
Status               : in progress         
 
[root@inception ~]# gluster v info
 
Volume Name: nagbug
Type: Tier
Volume ID: 02612ca6-59b0-4b93-8def-c181e509d6bc
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.36.4:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick2: 10.70.36.3:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick3: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick4: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick6: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick7: 10.70.36.3:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick8: 10.70.36.4:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick9: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.2/nagbug
Brick10: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.2/nagbug
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
features.uss: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
cluster.enable-shared-storage: disable
[root@inception ~]# gluster v get nagvol all|grep water
volume get option: failed: Volume nagvol does not exist
[root@inception ~]# gluster v get nagbug all|grep water
cluster.watermark-hi                    90                                      
cluster.watermark-low                   75                                      
[root@inception ~]# gluster v get nagbug all|grep freq
cluster.write-freq-threshold            0                                       
cluster.read-freq-threshold             0                                       
cluster.tier-promote-frequency          120                                     
cluster.tier-demote-frequency           3600                                    
features.scrub-freq                     biweekly                                
[root@inception ~]# 
[root@inception ~]# 
[root@inception ~]# gluster v info
 
Volume Name: nagbug
Type: Tier
Volume ID: 02612ca6-59b0-4b93-8def-c181e509d6bc
Status: Started
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.36.4:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick2: 10.70.36.3:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick3: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Brick4: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.5/nagbug_hot
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick6: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick7: 10.70.36.3:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick8: 10.70.36.4:/bricks/brick6/smallbricks/smallbrickmount.1/nagbug
Brick9: 10.70.34.50:/bricks/brick6/smallbricks/smallbrickmount.2/nagbug
Brick10: 10.70.36.2:/bricks/brick6/smallbricks/smallbrickmount.2/nagbug
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
features.uss: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
cluster.enable-shared-storage: disable
[root@inception ~]# 




NOTe:used glusterfs-fuse mount

Comment 11 errata-xmlrpc 2016-03-01 06:08:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Comment 12 Nag Pavan Chilakam 2016-05-09 12:04:34 UTC
changed needinfo assignee to karthick as he works on tiering