Red Hat Bugzilla – Bug 1283961
Data Tiering:Change the default tiering values to optimize tiering settings
Last modified: 2016-09-17 11:35:34 EDT
Description of problem:
Currently tiering has many options like promote/demote freq, read/write freq, max files, max size, watermark hi and low.
All the current default values against these params don't seem to be the likely case customers would use.
We need to keep these default values which would be close to the user's desired setting.
Eg:demote/promte freq is 120s currently, but that would be too agressive and no user in real time would want to demote a file so fast.
Hence we need these values to be corrected:
Follwoing are the current default values:
[root@zod distrep]# gluster v get olala all|grep tier
[root@zod distrep]# gluster v get olala all|grep ctr
[root@zod distrep]# gluster v get olala all|grep thres
Version-Release number of selected component (if applicable):
[root@zod distrep]# rpm -qa|grep gluster|grep server
Most of these tiering tunable parameters are migration-related. Migration had problems in the early builds; e.g. see https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c2.
glusterfs*-3.7.5-14.el7.x86_64 is showing better migration behaviour. Migration speed and impact on application I/O is currently being analyzed with the 3.7.5-14 build. Based on those results we can revisit the migration-related tunable parameters.
I had a meeting with the tiering team where I suggested the following settings for the tiering parameter default values, instead of the current values:
I'll add a more detailed comment on the discussions and the rationale for these values in a bit.
Explanation for comment #4:
I was able to complete only some of the migration tests referred
to in comment #3. These are fairly complex tests and migration in
gluster-tier is probabilistic in some cases, so not clear yet
whether this is a problem with migration functionality. So the
suggestions here are based on reasoning, not on actual test
results (which is also true of the current values). I have
elaborated on the reasoning behind these suggested values. The
tiering team will have another round of discussion among
themselves, and alter or ignore these suggested values.
There is also a need to revisit migration and migration-related
parameters in future releases to allow more control particularly
between promotion and demotion. Currently, the same parameters
e.g. max-files, read/write-freq-threshold are used to control
both promotion and demotion.
Since this is late in the 3.1.2 release cycle, want to keep
changes to a minimum.
The changes to cluster.tier-max-mb and cluster.tier-max-files are
intended to mitigate problems as reported in bz #1290667 where
migration of files selected as candidates in one cycle are not
completed in that cycle. The apropriate values for these
parameters will depend on the particular configuration and how
fast migration happens on that configuration. But in general,
migration in this release of gluster-tier is slow, and the
default values have been lowered to account for that.
The other change suggested in comment #4 was increasing the
demote-frequency to an hour. Currently promote/demote-frequency
are both set to 120 i.e 2 min. But they work differently.
Candidates for promotion are all files on the cold tier that were
accessed (enough times to meet the threshold) in the last
migration cycle, which will be a smaller set with a smaller
promote-frequency value; in contrast, candidates for demotion are
files on the hot tier that have _not_ been accessed in the last
cycle, which will be larger with a smaller demote-frequency
value. With the current demote-frequency of 2 min, the list of
candidate files to be demoted could be huge, resulting in too
many files getting demoted too soon.
Tested with 3.7.5-17 and the default values of cluster.tier-demote-frequency,cluster.tier-max-mb and cluster.tier-max-files changed as mentioned in the comment 4 so marking this bug as verified
[root@dhcp35-231 ~]# rpm -qa | grep glusterfs
[root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-demote-frequency
[root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-max-mb
[root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-max-files
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.