Bug 1272450 - Data Tiering:heat counters not getting reset and also internal ops seem to be heating the files
Data Tiering:heat counters not getting reset and also internal ops seem to be...
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: tiering (Show other bugs)
3.7.5
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: Dan Lambright
bugs@gluster.org
: Triaged
Depends On:
Blocks: 1272452 1275483 1275524 1276943
  Show dependency treegraph
 
Reported: 2015-10-16 08:26 EDT by nchilaka
Modified: 2017-03-08 05:57 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1272452 1276943 (view as bug list)
Environment:
Last Closed: 2017-03-08 05:57:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
brick logs like sql query while bug filing (139.67 KB, text/plain)
2015-10-16 08:29 EDT, nchilaka
no flags Details

  None (edit)
Description nchilaka 2015-10-16 08:26:32 EDT
Description of problem:
========================
I observed that sometimes the heat of a file is not getting reset in the next cycle. Also seems like internal operations like xattr changes are heating files, which is not acceptable


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
==================
1.created a 2x2 vol and started it
2.attached a tier with pure-distribute of 4 bricks ->each disk with only 1GB size(tried even with 2x2 hot layer)
3.Now, enabled ctr
4. Created a file f1 of size 700mb, which hashed to brick1 of hot tier
5. When idle the file got demoted
6. Now created f2 such that it got hashed to same brick as f1, ie brick1 of hot tier and 700mb size
7. Waited for it to get demoted
8. Now i touched both f1 and f2 files to heat them, but as the space in hot tier will be insufficient, I wanted to see the behavior
9. f1 got promoted , but f2 failed with tier log saying disk space not sufficient which is perfectly valid
10. But the heat measure was still showing up in sqldb query.
11, Waited for f1 to get demoted, ie in the next cycle,But i saw that while f1 got demoted, f2 got promoted, as the heat counters were not reset.

Also there was newly read heat counter too seen.


Expected results:
===============
>Heat counter should get reset
>Also, internal metadata read/writes or operations should not heat files
>read counters should not get set in this case









CLI LOGS:
=======
[root@zod glusterfs]# tail -f portugal-tier.log 
[2015-10-16 11:50:00.715894] E [MSGID: 109023] [dht-rebalance.c:699:__dht_check_free_space] 0-portugal-tier-dht: data movement attempted from node (portugal-cold-dht) to node (portugal-hot-dht) which does not have required free space for (/lisbon.2)
[2015-10-16 11:50:00.716661] E [MSGID: 109037] [tier.c:492:tier_migrate_using_query_file] 0-portugal-tier-dht: ERROR -28 in current migration lisbon.2 /lisbon.2

[2015-10-16 11:50:00.716820] E [MSGID: 109037] [tier.c:1454:tier_start] 0-portugal-tier-dht: Promotion failed
[2015-10-16 11:52:00.728929] I [MSGID: 109038] [tier.c:1008:tier_build_migration_qfile] 0-portugal-tier-dht: Failed to remove /var/run/gluster/portugal-tier-dht/promotequeryfile-portugal-tier-dht
[2015-10-16 11:52:00.734278] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-portugal-tier-dht: Tier 1 src_subvol portugal-cold-dht file lisbon.2
[2015-10-16 11:52:00.736125] I [dht-rebalance.c:1103:dht_migrate_file] 0-portugal-tier-dht: /lisbon.2: attempting to move from portugal-cold-dht to portugal-hot-dht
[2015-10-16 11:52:22.250989] I [MSGID: 109022] [dht-rebalance.c:1430:dht_migrate_file] 0-portugal-tier-dht: completed migration of /lisbon.2 from subvolume portugal-cold-dht to portugal-hot-dht
[2015-10-16 12:17:16.194105] I [MSGID: 109028] [dht-rebalance.c:3327:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 83897.00 secs
[2015-10-16 12:17:16.194140] I [MSGID: 109028] [dht-rebalance.c:3331:gf_defrag_status_get] 0-glusterfs: Files migrated: 15, size: 0, lookups: 21, failures: 6, skipped: 0





Status of volume: portugal
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick yarrow:/dummy/brick108/portugal_hot   49240     0          Y       30546
Brick zod:/dummy/brick108/portugal_hot      49240     0          Y       4923 
Brick yarrow:/dummy/brick107/portugal_hot   49239     0          Y       30524
Brick zod:/dummy/brick107/portugal_hot      49239     0          Y       4903 
Cold Bricks:
Brick zod:/rhs/brick1/portugal              49237     0          Y       2557 
Brick yarrow:/rhs/brick1/portugal           49237     0          Y       28413
Brick zod:/rhs/brick2/portugal              49238     0          Y       2575 
Brick yarrow:/rhs/brick2/portugal           49238     0          Y       28433
NFS Server on localhost                     2049      0          Y       11729
Self-heal Daemon on localhost               N/A       N/A        Y       11852
Quota Daemon on localhost                   N/A       N/A        Y       11748
NFS Server on yarrow                        2049      0          Y       32441
Self-heal Daemon on yarrow                  N/A       N/A        Y       32646
Quota Daemon on yarrow                      N/A       N/A        Y       32537
 
Task Status of Volume portugal
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : 931de257-0dcd-4125-87a8-0cce35caca38
Status               : in progress         
 
[root@zod ~]# gluster v tier portugal status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            7                    8                    in progress         
yarrow               0                    19                   in progress         
volume rebalance: portugal: success: 
[root@zod ~]#
Comment 1 nchilaka 2015-10-16 08:29:02 EDT
  #######before start of f1 or f2 promote 
[root@zod ~]# gluster v tier portugal status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            7                    8                    in progress         
yarrow               0                    19                   in progress         
volume rebalance: portugal: success: 
[root@zod ~]# 
[root@zod ~]# 
[root@zod ~]# 
[root@zod ~]# #######after  f1 got promoted#############
[root@zod ~]# gluster v tier portugal status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            8                    8                    in progress         
yarrow               0                    20                   in progress         
volume rebalance: portugal: success: 
[root@zod ~]# 
[root@zod ~]# 
[root@zod ~]# #######after  f1 got demoted and f2 promoted#############
[root@zod ~]# gluster v tier portugal status 
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            9                    8                    in progress         
yarrow               0                    21                   in progress         
volume rebalance: portugal: success: 
[root@zod ~]#
Comment 2 nchilaka 2015-10-16 08:29 EDT
Created attachment 1083639 [details]
brick logs like sql query while bug filing
Comment 3 nchilaka 2015-10-16 08:33:58 EDT
sosreport@rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/nchilaka/bug.1272450
Comment 4 Kaushal 2017-03-08 05:57:24 EST
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.