Description of problem: ====================== Currently tiering has many options like promote/demote freq, read/write freq, max files, max size, watermark hi and low. All the current default values against these params don't seem to be the likely case customers would use. We need to keep these default values which would be close to the user's desired setting. Eg:demote/promte freq is 120s currently, but that would be too agressive and no user in real time would want to demote a file so fast. Hence we need these values to be corrected: Follwoing are the current default values: [root@zod distrep]# gluster v get olala all|grep tier cluster.tier-promote-frequency 120 cluster.tier-demote-frequency 120 cluster.tier-mode test cluster.tier-max-mb 1000 cluster.tier-max-files 5000 [root@zod distrep]# gluster v get olala all|grep ctr features.ctr-enabled on features.ctr_link_consistency off features.ctr_hardlink_heal_expire_period300 features.ctr_inode_heal_expire_period 300 [root@zod distrep]# gluster v get olala all|grep thres cluster.write-freq-threshold 0 cluster.read-freq-threshold 0 Version-Release number of selected component (if applicable): ========================================================== [root@zod distrep]# rpm -qa|grep gluster|grep server glusterfs-server-3.7.5-6.el7rhgs.x86_64
Most of these tiering tunable parameters are migration-related. Migration had problems in the early builds; e.g. see https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c2. glusterfs*-3.7.5-14.el7.x86_64 is showing better migration behaviour. Migration speed and impact on application I/O is currently being analyzed with the 3.7.5-14 build. Based on those results we can revisit the migration-related tunable parameters.
I had a meeting with the tiering team where I suggested the following settings for the tiering parameter default values, instead of the current values: cluster.tier-demote-frequency 3600 cluster.tier-max-mb 4000 cluster.tier-max-files 10000 [others unchangedd] I'll add a more detailed comment on the discussions and the rationale for these values in a bit.
Explanation for comment #4: I was able to complete only some of the migration tests referred to in comment #3. These are fairly complex tests and migration in gluster-tier is probabilistic in some cases, so not clear yet whether this is a problem with migration functionality. So the suggestions here are based on reasoning, not on actual test results (which is also true of the current values). I have elaborated on the reasoning behind these suggested values. The tiering team will have another round of discussion among themselves, and alter or ignore these suggested values. There is also a need to revisit migration and migration-related parameters in future releases to allow more control particularly between promotion and demotion. Currently, the same parameters e.g. max-files, read/write-freq-threshold are used to control both promotion and demotion. Since this is late in the 3.1.2 release cycle, want to keep changes to a minimum. The changes to cluster.tier-max-mb and cluster.tier-max-files are intended to mitigate problems as reported in bz #1290667 where migration of files selected as candidates in one cycle are not completed in that cycle. The apropriate values for these parameters will depend on the particular configuration and how fast migration happens on that configuration. But in general, migration in this release of gluster-tier is slow, and the default values have been lowered to account for that. The other change suggested in comment #4 was increasing the demote-frequency to an hour. Currently promote/demote-frequency are both set to 120 i.e 2 min. But they work differently. Candidates for promotion are all files on the cold tier that were accessed (enough times to meet the threshold) in the last migration cycle, which will be a smaller set with a smaller promote-frequency value; in contrast, candidates for demotion are files on the hot tier that have _not_ been accessed in the last cycle, which will be larger with a smaller demote-frequency value. With the current demote-frequency of 2 min, the list of candidate files to be demoted could be huge, resulting in too many files getting demoted too soon.
Tested with 3.7.5-17 and the default values of cluster.tier-demote-frequency,cluster.tier-max-mb and cluster.tier-max-files changed as mentioned in the comment 4 so marking this bug as verified [root@dhcp35-231 ~]# rpm -qa | grep glusterfs glusterfs-client-xlators-3.7.5-17.el7rhgs.x86_64 glusterfs-server-3.7.5-17.el7rhgs.x86_64 glusterfs-3.7.5-17.el7rhgs.x86_64 glusterfs-api-3.7.5-17.el7rhgs.x86_64 glusterfs-cli-3.7.5-17.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-17.el7rhgs.x86_64 glusterfs-libs-3.7.5-17.el7rhgs.x86_64 glusterfs-fuse-3.7.5-17.el7rhgs.x86_64 glusterfs-rdma-3.7.5-17.el7rhgs.x86_64 [root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-demote-frequency cluster.tier-demote-frequency 3600 [root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-max-mb cluster.tier-max-mb 4000 [root@dhcp35-231 ~]# gluster vol get delete all | grep cluster.tier-max-files cluster.tier-max-files 10000
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html