Bug 1283961

Summary:	Data Tiering:Change the default tiering values to optimize tiering settings
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	tier	Assignee:	Bug Updates Notification Mailing List <rhs-bugs>
Status:	CLOSED ERRATA	QA Contact:	RajeshReddy <rmekala>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	rhgs-3.1	CC:	asrivast, dlambrig, mpillai, mzywusko, rcyriac, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.1.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.7.5-17	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1300412 (view as bug list)		Environment:
Last Closed:	2016-03-01 05:56:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1260783, 1300412, 1306302

Description Nag Pavan Chilakam 2015-11-20 11:48:39 UTC

Description of problem:
======================
Currently tiering has many options like promote/demote freq, read/write freq, max files, max size, watermark hi and low.
All the current default values against these params don't seem to be the likely case customers would use.
We need to keep these default values which would be close to the user's desired setting.
Eg:demote/promte freq is 120s currently, but that would be too agressive and no user in real time would want to demote a file so fast.

Hence we need these values to be corrected:

Follwoing are the current default values:
[root@zod distrep]# gluster v get olala all|grep tier
cluster.tier-promote-frequency          120                                     
cluster.tier-demote-frequency           120                                     
cluster.tier-mode                       test                                    
cluster.tier-max-mb                     1000                                    
cluster.tier-max-files                  5000       
[root@zod distrep]# gluster v get olala all|grep ctr
features.ctr-enabled                    on                                      
features.ctr_link_consistency           off                                     
features.ctr_hardlink_heal_expire_period300                                     
features.ctr_inode_heal_expire_period   300                       
[root@zod distrep]# gluster v get olala all|grep thres
cluster.write-freq-threshold            0                                       
cluster.read-freq-threshold             0                                

Version-Release number of selected component (if applicable):
==========================================================
[root@zod distrep]# rpm -qa|grep gluster|grep server
glusterfs-server-3.7.5-6.el7rhgs.x86_64

Comment 3 Manoj Pillai 2016-01-12 04:49:18 UTC

Most of these tiering tunable parameters are migration-related. Migration had problems in the early builds; e.g. see https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c2.

glusterfs*-3.7.5-14.el7.x86_64 is showing better migration behaviour. Migration speed and impact on application I/O is currently being analyzed with the 3.7.5-14 build. Based on those results we can revisit the migration-related tunable parameters.

Comment 4 Manoj Pillai 2016-01-20 14:09:52 UTC

I had a meeting with the tiering team where I suggested the following settings for the tiering parameter default values, instead of the current values:

cluster.tier-demote-frequency           3600

cluster.tier-max-mb                     4000
cluster.tier-max-files                  10000

[others unchangedd]

I'll add a more detailed comment on the discussions and the rationale for these values in a bit.

Comment 5 Manoj Pillai 2016-01-20 15:17:14 UTC

Explanation for comment #4:

I was able to complete only some of the migration tests referred
to in comment #3. These are fairly complex tests and migration in
gluster-tier is probabilistic in some cases, so not clear yet
whether this is a problem with migration functionality. So the
suggestions here are based on reasoning, not on actual test
results (which is also true of the current values). I have
elaborated on the reasoning behind these suggested values. The
tiering team will have another round of discussion among
themselves, and alter or ignore these suggested values.

There is also a need to revisit migration and migration-related
parameters in future releases to allow more control particularly
between promotion and demotion. Currently, the same parameters
e.g. max-files, read/write-freq-threshold are used to control
both promotion and demotion.

Since this is late in the 3.1.2 release cycle, want to keep
changes to a minimum.

The changes to cluster.tier-max-mb and cluster.tier-max-files are
intended to mitigate problems as reported in bz #1290667 where
migration of files selected as candidates in one cycle are not
completed in that cycle. The apropriate values for these
parameters will depend on the particular configuration and how
fast migration happens on that configuration. But in general,
migration in this release of gluster-tier is slow, and the
default values have been lowered to account for that.

The other change suggested in comment #4 was increasing the
demote-frequency to an hour. Currently promote/demote-frequency
are both set to 120 i.e 2 min. But they work differently.
Candidates for promotion are all files on the cold tier that were
accessed (enough times to meet the threshold) in the last
migration cycle, which will be a smaller set with a smaller
promote-frequency value; in contrast, candidates for demotion are
files on the hot tier that have _not_ been accessed in the last
cycle, which will be larger with a smaller demote-frequency
value. With the current demote-frequency of 2 min, the list of
candidate files to be demoted could be huge, resulting in too
many files getting demoted too soon.

Comment 6 RajeshReddy 2016-01-25 14:31:59 UTC

Tested with 3.7.5-17 and the default values of cluster.tier-demote-frequency,cluster.tier-max-mb and cluster.tier-max-files changed as mentioned in the comment 4 so marking this bug as verified 

[root@dhcp35-231 ~]# rpm -qa | grep glusterfs 
glusterfs-client-xlators-3.7.5-17.el7rhgs.x86_64
glusterfs-server-3.7.5-17.el7rhgs.x86_64
glusterfs-3.7.5-17.el7rhgs.x86_64
glusterfs-api-3.7.5-17.el7rhgs.x86_64
glusterfs-cli-3.7.5-17.el7rhgs.x86_64
glusterfs-geo-replication-3.7.5-17.el7rhgs.x86_64
glusterfs-libs-3.7.5-17.el7rhgs.x86_64
glusterfs-fuse-3.7.5-17.el7rhgs.x86_64
glusterfs-rdma-3.7.5-17.el7rhgs.x86_64



[root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-demote-frequency 
cluster.tier-demote-frequency           3600                                    
[root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-max-mb
cluster.tier-max-mb                     4000                                    
[root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-max-files
cluster.tier-max-files                  10000

Comment 8 errata-xmlrpc 2016-03-01 05:56:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html