+++ This bug was initially created as a clone of Bug #1272450 +++ Description of problem: ======================== I observed that sometimes the heat of a file is not getting reset in the next cycle. Also seems like internal operations like xattr changes are heating files, which is not acceptable Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: ================== 1.created a 2x2 vol and started it 2.attached a tier with pure-distribute of 4 bricks ->each disk with only 1GB size(tried even with 2x2 hot layer) 3.Now, enabled ctr 4. Created a file f1 of size 700mb, which hashed to brick1 of hot tier 5. When idle the file got demoted 6. Now created f2 such that it got hashed to same brick as f1, ie brick1 of hot tier and 700mb size 7. Waited for it to get demoted 8. Now i touched both f1 and f2 files to heat them, but as the space in hot tier will be insufficient, I wanted to see the behavior 9. f1 got promoted , but f2 failed with tier log saying disk space not sufficient which is perfectly valid 10. But the heat measure was still showing up in sqldb query. 11, Waited for f1 to get demoted, ie in the next cycle,But i saw that while f1 got demoted, f2 got promoted, as the heat counters were not reset. Also there was newly read heat counter too seen. Expected results: =============== >Heat counter should get reset >Also, internal metadata read/writes or operations should not heat files >read counters should not get set in this case CLI LOGS: ======= [root@zod glusterfs]# tail -f portugal-tier.log [2015-10-16 11:50:00.715894] E [MSGID: 109023] [dht-rebalance.c:699:__dht_check_free_space] 0-portugal-tier-dht: data movement attempted from node (portugal-cold-dht) to node (portugal-hot-dht) which does not have required free space for (/lisbon.2) [2015-10-16 11:50:00.716661] E [MSGID: 109037] [tier.c:492:tier_migrate_using_query_file] 0-portugal-tier-dht: ERROR -28 in current migration lisbon.2 /lisbon.2 [2015-10-16 11:50:00.716820] E [MSGID: 109037] [tier.c:1454:tier_start] 0-portugal-tier-dht: Promotion failed [2015-10-16 11:52:00.728929] I [MSGID: 109038] [tier.c:1008:tier_build_migration_qfile] 0-portugal-tier-dht: Failed to remove /var/run/gluster/portugal-tier-dht/promotequeryfile-portugal-tier-dht [2015-10-16 11:52:00.734278] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-portugal-tier-dht: Tier 1 src_subvol portugal-cold-dht file lisbon.2 [2015-10-16 11:52:00.736125] I [dht-rebalance.c:1103:dht_migrate_file] 0-portugal-tier-dht: /lisbon.2: attempting to move from portugal-cold-dht to portugal-hot-dht [2015-10-16 11:52:22.250989] I [MSGID: 109022] [dht-rebalance.c:1430:dht_migrate_file] 0-portugal-tier-dht: completed migration of /lisbon.2 from subvolume portugal-cold-dht to portugal-hot-dht [2015-10-16 12:17:16.194105] I [MSGID: 109028] [dht-rebalance.c:3327:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 83897.00 secs [2015-10-16 12:17:16.194140] I [MSGID: 109028] [dht-rebalance.c:3331:gf_defrag_status_get] 0-glusterfs: Files migrated: 15, size: 0, lookups: 21, failures: 6, skipped: 0 Status of volume: portugal Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick yarrow:/dummy/brick108/portugal_hot 49240 0 Y 30546 Brick zod:/dummy/brick108/portugal_hot 49240 0 Y 4923 Brick yarrow:/dummy/brick107/portugal_hot 49239 0 Y 30524 Brick zod:/dummy/brick107/portugal_hot 49239 0 Y 4903 Cold Bricks: Brick zod:/rhs/brick1/portugal 49237 0 Y 2557 Brick yarrow:/rhs/brick1/portugal 49237 0 Y 28413 Brick zod:/rhs/brick2/portugal 49238 0 Y 2575 Brick yarrow:/rhs/brick2/portugal 49238 0 Y 28433 NFS Server on localhost 2049 0 Y 11729 Self-heal Daemon on localhost N/A N/A Y 11852 Quota Daemon on localhost N/A N/A Y 11748 NFS Server on yarrow 2049 0 Y 32441 Self-heal Daemon on yarrow N/A N/A Y 32646 Quota Daemon on yarrow N/A N/A Y 32537 Task Status of Volume portugal ------------------------------------------------------------------------------ Task : Tier migration ID : 931de257-0dcd-4125-87a8-0cce35caca38 Status : in progress [root@zod ~]# gluster v tier portugal status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 7 8 in progress yarrow 0 19 in progress volume rebalance: portugal: success: [root@zod ~]# --- Additional comment from nchilaka on 2015-10-16 08:29:02 EDT --- #######before start of f1 or f2 promote [root@zod ~]# gluster v tier portugal status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 7 8 in progress yarrow 0 19 in progress volume rebalance: portugal: success: [root@zod ~]# [root@zod ~]# [root@zod ~]# [root@zod ~]# #######after f1 got promoted############# [root@zod ~]# gluster v tier portugal status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 8 8 in progress yarrow 0 20 in progress volume rebalance: portugal: success: [root@zod ~]# [root@zod ~]# [root@zod ~]# #######after f1 got demoted and f2 promoted############# [root@zod ~]# gluster v tier portugal status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 9 8 in progress yarrow 0 21 in progress volume rebalance: portugal: success: [root@zod ~]# --- Additional comment from nchilaka on 2015-10-16 08:29 EDT --- --- Additional comment from nchilaka on 2015-10-16 08:33:58 EDT --- sosreport.eng.blr.redhat.com:/home/repo/sosreports/nchilaka/bug.1272450
https://code.engineering.redhat.com/gerrit/60169
Patch 61017 fixed a bug in which the heat measure was not being cleared until 3 promotion cycles elapsed, which accounts for the failure at step 10. Tiering does a best effort to promote and demote files such that statistically / in aggregate, a large number of files will migrate to the hot or cold tier based on access frequency, over time. The feature does not guarantee individual files will be promoted or demoted based on single accesses.
I have tried the above scenario with test mode, and don't see heating of files in db query. But there are other related issues where files are getting spilled over and we have bugs for those like 1290667 Hence closing this bug as verified root@yarrow ~]# date Fri Dec 18 23:24:42 IST 2015 [root@yarrow ~]# [root@yarrow ~]# [root@yarrow ~]# echo "===========Date=====================";date; echo "=============ColdBrick#1 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick1/pepsi/.glusterfs/pepsi.db;echo "=============ColdBrick#2 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick1/pepsi/.glusterfs/pepsi.db; echo ">>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /dummy/brick108/pepsi_hot/.glusterfs/pepsi_hot.db;echo "###############################";date;ll /*/brick*/pepsi*;du -sh /var/run/gluster/pepsi-tier-dht/* ===========Date===================== Fri Dec 18 23:24:46 IST 2015 =============ColdBrick#1 ========= d84025cb-b0ec-482d-9baf-185b79504d07|0|0|0|0|0|0|0|0|1|1 d84025cb-b0ec-482d-9baf-185b79504d07|00000000-0000-0000-0000-000000000001|file3|/file3|0|0 =============ColdBrick#2 ========= d84025cb-b0ec-482d-9baf-185b79504d07|0|0|0|0|0|0|0|0|1|1 d84025cb-b0ec-482d-9baf-185b79504d07|00000000-0000-0000-0000-000000000001|file3|/file3|0|0 >>>>>>>>>>>> HOTBRICK#1 <<<<<<<<== 1111f560-71c3-45ee-a669-05de8d04338d|1450461121|38775|0|0|0|0|0|0|0|0 31c2dd29-ec1a-4ca6-b3b1-af5224109a77|1450461132|81252|0|0|0|0|0|0|0|0 ab1279eb-cc2b-4efe-8edb-fe537daa0de7|1450461143|385949|0|0|0|0|0|0|0|0 dec578c2-3e44-452b-8cea-b84bc2c523b2|1450461154|714885|0|0|0|0|0|0|0|0 1111f560-71c3-45ee-a669-05de8d04338d|00000000-0000-0000-0000-000000000001|file8|/file8|0|0 31c2dd29-ec1a-4ca6-b3b1-af5224109a77|00000000-0000-0000-0000-000000000001|file5|/file5|0|0 ab1279eb-cc2b-4efe-8edb-fe537daa0de7|00000000-0000-0000-0000-000000000001|file6|/file6|0|0 dec578c2-3e44-452b-8cea-b84bc2c523b2|00000000-0000-0000-0000-000000000001|file2|/file2|0|0 ############################### Fri Dec 18 23:24:46 IST 2015 /dummy/brick108/pepsi_hot: total 900032 ---------T. 2 root root 0 Dec 18 23:22 file1 ---------T. 2 root root 0 Dec 18 23:22 file2 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file5 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file6 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file8 /rhs/brick1/pepsi: total 4 -rw-r--r--. 2 root root 3 Dec 18 23:20 file3 /rhs/brick2/pepsi: total 300004 ---------T. 2 root root 0 Dec 18 23:22 file1 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file2 ---------T. 2 root root 0 Dec 18 23:22 file5 ---------T. 2 root root 0 Dec 18 23:22 file6 ---------T. 2 root root 0 Dec 18 23:22 file8 0 /var/run/gluster/pepsi-tier-dht/demotequeryfile-pepsi-tier-dht 0 /var/run/gluster/pepsi-tier-dht/promotequeryfile-pepsi-tier-dht [root@yarrow ~]# [root@yarrow ~]# [root@yarrow ~]# [root@yarrow ~]# [root@yarrow ~]# date Fri Dec 18 23:27:16 IST 2015 [root@yarrow ~]# echo "===========Date=====================";date; echo "=============ColdBrick#1 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick1/pepsi/.glusterfs/pepsi.db;echo "=============ColdBrick#2 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick1/pepsi/.glusterfs/pepsi.db; echo ">>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /dummy/brick108/pepsi_hot/.glusterfs/pepsi_hot.db;echo "###############################";date;ll /*/brick*/pepsi*;du -sh /var/run/gluster/pepsi-tier-dht/* ===========Date===================== Fri Dec 18 23:27:17 IST 2015 =============ColdBrick#1 ========= d84025cb-b0ec-482d-9baf-185b79504d07|0|0|0|0|0|0|0|0|0|0 d84025cb-b0ec-482d-9baf-185b79504d07|00000000-0000-0000-0000-000000000001|file3|/file3|0|0 =============ColdBrick#2 ========= d84025cb-b0ec-482d-9baf-185b79504d07|0|0|0|0|0|0|0|0|0|0 d84025cb-b0ec-482d-9baf-185b79504d07|00000000-0000-0000-0000-000000000001|file3|/file3|0|0 >>>>>>>>>>>> HOTBRICK#1 <<<<<<<<== dec578c2-3e44-452b-8cea-b84bc2c523b2|1450461154|714885|0|0|0|0|0|0|0|0 dec578c2-3e44-452b-8cea-b84bc2c523b2|00000000-0000-0000-0000-000000000001|file2|/file2|0|0 ############################### Fri Dec 18 23:27:17 IST 2015 /dummy/brick108/pepsi_hot: total 4 ---------T. 2 root root 0 Dec 18 23:22 file2 /rhs/brick1/pepsi: total 4 -rw-r--r--. 2 root root 3 Dec 18 23:20 file3 /rhs/brick2/pepsi: total 1500020 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file1 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file2 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file5 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file6 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file8 4.0K /var/run/gluster/pepsi-tier-dht/demotequeryfile-pepsi-tier-dht 0 /var/run/gluster/pepsi-tier-dht/promotequeryfile-pepsi-tier-dht [root@yarrow ~]# [root@yarrow ~]# [root@yarrow ~]# echo "===========Date=====================";date; echo "=============ColdBrick#1 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick1/pepsi/.glusterfs/pepsi.db;echo "=============ColdBrick#2 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick1/pepsi/.glusterfs/pepsi.db; echo ">>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /dummy/brick108/pepsi_hot/.glusterfs/pepsi_hot.db;echo "###############################";date;ll /*/brick*/pepsi*;du -sh /var/run/gluster/pepsi-tier-dht/* ===========Date===================== Fri Dec 18 23:27:57 IST 2015 =============ColdBrick#1 ========= =============ColdBrick#2 ========= >>>>>>>>>>>> HOTBRICK#1 <<<<<<<<== dec578c2-3e44-452b-8cea-b84bc2c523b2|1450461154|714885|0|0|0|0|0|0|0|0 dec578c2-3e44-452b-8cea-b84bc2c523b2|00000000-0000-0000-0000-000000000001|file2|/file2|0|0 ############################### Fri Dec 18 23:27:57 IST 2015 /dummy/brick108/pepsi_hot: total 4 ---------T. 2 root root 0 Dec 18 23:22 file2 /rhs/brick1/pepsi: total 0 /rhs/brick2/pepsi: total 1500020 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file1 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file2 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file5 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file6 -rw-r--r--. 2 root root 307200003 Dec 18 23:20 file8 4.0K /var/run/gluster/pepsi-tier-dht/demotequeryfile-pepsi-tier-dht 0 /var/run/gluster/pepsi-tier-dht/promotequeryfile-pepsi-tier-dht [root@yarrow ~]# [root@zod ~]# rpm -qa|grep gluster glusterfs-fuse-3.7.5-12.el7rhgs.x86_64 glusterfs-server-3.7.5-12.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-12.el7rhgs.x86_64 glusterfs-cli-3.7.5-12.el7rhgs.x86_64 glusterfs-libs-3.7.5-12.el7rhgs.x86_64 glusterfs-api-3.7.5-12.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-12.el7rhgs.x86_64 glusterfs-3.7.5-12.el7rhgs.x86_64 [root@zod ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html