Bug 1286218 - Data Tiering:Watermark:File continuously trying to demote itself but failing " [dht-rebalance.c:608:__dht_rebalance_create_dst_file] 0-wmrk-tier-dht: chown failed for //AP.BH.avi on wmrk-cold-dht (No such file or directory)"
Summary: Data Tiering:Watermark:File continuously trying to demote itself but failing ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.1.2
Assignee: Mohammed Rafi KC
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks: 1292067 1293342
TreeView+ depends on / blocked
 
Reported: 2015-11-27 13:05 UTC by Nag Pavan Chilakam
Modified: 2019-04-03 09:15 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.7.5-13
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1292067 (view as bug list)
Environment:
Last Closed: 2016-03-01 06:03:46 UTC
Embargoed:


Attachments (Terms of Use)
Bug reproduced (36.44 KB, application/vnd.oasis.opendocument.text)
2015-12-24 13:13 UTC, Sweta Anandpara
no flags Details
Bug fixed (32.91 KB, application/vnd.oasis.opendocument.text)
2015-12-24 13:14 UTC, Sweta Anandpara
no flags Details
Server and client logs (32.01 KB, application/vnd.oasis.opendocument.text)
2015-12-28 12:41 UTC, Sweta Anandpara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Nag Pavan Chilakam 2015-11-27 13:05:32 UTC
Description of problem:
======================
With watermarks enabled, I was playing with the hi and low watermarks.
FOllowing is my configuration:
Cold tier:
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk
Brick2: zod:/rhs/brick5/1/brick1008/wmrk
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12


Volume size before and after atttach tier:
Before:
[root@localhost wmrk]# df -h ../wmrk
Filesystem      Size  Used Avail Use% Mounted on
zod:wmrk        153G  4.3G  149G   3% /mnt/wmrk
After:
[root@localhost wmrk]# df -h ../wmrk
Filesystem      Size  Used Avail Use% Mounted on
zod:wmrk        172G  4.3G  168G   3% /mnt/wmrk

That means my hot tier is about 20GB size

Now I set the cold and hot to below:
cluster.watermark-hi: 99
cluster.watermark-low: 75

created some 10 files of about 700MB each and then when they were left idle moved the watermarks to below:
cluster.watermark-hi: 5
cluster.watermark-low: 1

which means, that the if there is more than 1GB of data in hot tier(ie Hi), then only demotes will happen.
Now hence one file(AP.BH.avi) with 700MB was remaining in hot tier, even though it is idle.

Now i created a 1GB file which went to hot tier and hence this old file started to demote, but then failed after some time and the new file got demoted.




Version-Release number of selected component (if applicable):
=======================
3.7.5-7





Volume Name: wmrk
Type: Tier
Volume ID: 4013b73f-8739-48cd-9f19-f4a3b1228462
Status: Started
Number of Bricks: 14
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk
Brick2: zod:/rhs/brick5/1/brick1008/wmrk
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick3: zod:/rhs/brick5/1/brick1001/wmrk
Brick4: yarrow:/rhs/brick5/1/brick1001/wmrk
Brick5: zod:/rhs/brick5/1/brick1002/wmrk
Brick6: yarrow:/rhs/brick5/1/brick1002/wmrk
Brick7: zod:/rhs/brick5/1/brick1003/wmrk
Brick8: yarrow:/rhs/brick5/1/brick1003/wmrk
Brick9: zod:/rhs/brick5/1/brick1004/wmrk
Brick10: yarrow:/rhs/brick5/1/brick1004/wmrk
Brick11: zod:/rhs/brick5/1/brick1005/wmrk
Brick12: yarrow:/rhs/brick5/1/brick1005/wmrk
Brick13: zod:/rhs/brick5/1/brick1006/wmrk
Brick14: yarrow:/rhs/brick5/1/brick1006/wmrk
Options Reconfigured:
cluster.read-freq-threshold: 3
cluster.write-freq-threshold: 3
features.record-counters: on
cluster.tier-promote-frequency: 120
cluster.tier-demote-frequency: 300
cluster.watermark-hi: 3
cluster.watermark-low: 1
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on

Comment 2 Nag Pavan Chilakam 2015-11-27 13:06:49 UTC
 not permitted)
[2015-11-27 13:04:00.442970] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-2: remote operation failed [No such file or directory]
[2015-11-27 13:04:00.442992] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-0: remote operation failed [No such file or directory]
[2015-11-27 13:04:00.442989] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-4: remote operation failed [No such file or directory]
[2015-11-27 13:04:00.445067] E [MSGID: 109023] [dht-rebalance.c:608:__dht_rebalance_create_dst_file] 0-wmrk-tier-dht: chown failed for //AP.BH.avi on wmrk-cold-dht (No such file or directory)
The message "W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-1: remote operation failed [No such file or directory]" repeated 2 times between [2015-11-27 13:03:50.355235] and [2015-11-27 13:04:19.948897]
[2015-11-27 13:04:19.948925] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-2: remote operation failed [No such file or directory]
The message "W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-5: remote operation failed [No such file or directory]" repeated 2 times between [2015-11-27 13:03:50.355332] and [2015-11-27 13:04:19.949000]
[2015-11-27 13:04:19.949018] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-4: remote operation failed [No such file or directory]
[2015-11-27 13:04:19.949062] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-0: remote operation failed [No such file or directory]
[2015-11-27 13:04:19.951061] W [MSGID: 109023] [dht-rebalance.c:1418:dht_migrate_file] 0-wmrk-tier-dht: Migrate file failed://AP.BH.avi: failed to perform setattr on wmrk-cold-dht  [No such file or directory]

Comment 3 Nag Pavan Chilakam 2015-11-27 13:53:58 UTC
Below, is the log of db, where file starts to migrate and then fails:
[root@zod ~]# 
[root@zod ~]# echo "===========Date=====================";date; echo "=============ColdBrick#1 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1001/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#2 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1002/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#3 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1003/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#4 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1004/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#5 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1005/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#6 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1006/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo ">>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1008/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo ">>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;ll /rhs/brick5/1/brick100*/wmrk/AP*;du -sh  /rhs/brick5/1/brick100*/wmrk
===========Date=====================
Fri Nov 27 18:48:17 IST 2015
=============ColdBrick#1 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#2 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#3 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#4 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#5 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#6 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
>>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==
bb063a50-1d92-4f28-955e-1d3c5f1a1971|1448604159|503768|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
>>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==
Error: unable to open database "/rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db": unable to open database file
---------T. 2 root root 183465472 Nov 27 18:48 /rhs/brick5/1/brick1001/wmrk/AP.BH.avi
---------T. 2 root root 183465472 Nov 27 18:48 /rhs/brick5/1/brick1002/wmrk/AP.BH.avi
---------T. 2 root root 183465472 Nov 27 18:48 /rhs/brick5/1/brick1003/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 /rhs/brick5/1/brick1004/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 /rhs/brick5/1/brick1005/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 /rhs/brick5/1/brick1006/wmrk/AP.BH.avi
-rw-r-Sr-T. 2 root root 733861888 Nov 27 11:32 /rhs/brick5/1/brick1008/wmrk/AP.BH.avi
3.3G	/rhs/brick5/1/brick1001/wmrk
3.3G	/rhs/brick5/1/brick1002/wmrk
3.3G	/rhs/brick5/1/brick1003/wmrk
3.9G	/rhs/brick5/1/brick1004/wmrk
3.9G	/rhs/brick5/1/brick1005/wmrk
3.9G	/rhs/brick5/1/brick1006/wmrk
704M	/rhs/brick5/1/brick1008/wmrk
[root@zod ~]# echo "===========Date=====================";date; echo "=============ColdBrick#1 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1001/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#2 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1002/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#3 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1003/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#4 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1004/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#5 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1005/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#6 =========" ;  echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1006/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo ">>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1008/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo ">>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;ll /rhs/brick5/1/brick100*/wmrk/AP*;du -sh  /rhs/brick5/1/brick100*/wmrk
===========Date=====================
Fri Nov 27 18:48:19 IST 2015
=============ColdBrick#1 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#2 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#3 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#4 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#5 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
=============ColdBrick#6 =========
bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
>>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==
bb063a50-1d92-4f28-955e-1d3c5f1a1971|1448604159|503768|0|0|0|0|0|0|0|0
bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
>>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==
Error: unable to open database "/rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db": unable to open database file
---------T. 2 root root         0 Nov 27 18:48 /rhs/brick5/1/brick1001/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 18:48 /rhs/brick5/1/brick1002/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 18:48 /rhs/brick5/1/brick1003/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 /rhs/brick5/1/brick1004/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 /rhs/brick5/1/brick1005/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 /rhs/brick5/1/brick1006/wmrk/AP.BH.avi
-rw-r--r--. 2 root root 733861888 Nov 27 11:32 /rhs/brick5/1/brick1008/wmrk/AP.BH.avi
3.1G	/rhs/brick5/1/brick1001/wmrk
3.1G	/rhs/brick5/1/brick1002/wmrk
3.1G	/rhs/brick5/1/brick1003/wmrk
3.9G	/rhs/brick5/1/brick1004/wmrk
3.9G	/rhs/brick5/1/brick1005/wmrk
3.9G	/rhs/brick5/1/brick1006/wmrk
704M	/rhs/brick5/1/brick1008/wmrk
[root@zod ~]# 


This cycle goes on indefinitely

Comment 4 Nithya Balachandran 2015-11-27 16:59:58 UTC
There are several different errors in the log files:

1. No such file or directory:
[2015-11-26 11:22:12.336273] E [MSGID: 109040] [dht-helper.c:1020:dht_migration_complete_check_task] 0-wmrk-cold-dht: //fiftygb: failed to lookup the file on wmrk-cold-dht [No such file or directory]
[2015-11-26 11:22:12.336618] W [MSGID: 109023] [dht-rebalance.c:1532:dht_migrate_file] 0-wmrk-tier-dht: //fiftygb: failed to do a stat on wmrk-cold-dht (No such file or directory)
[2015-11-26 11:22:14.039588] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/disperse.so(ec_resume+0x88) [0x7f9c3ebed628] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/disperse.so(__ec_manager+0x5b) [0x7f9c3ebed47b] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/disperse.so(ec_manager_setattr+0x1f3) [0x7f9c3ec05593] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size)



From the posix logs:
rhs-brick5-1-brick1004-wmrk.log:256:[2015-11-26 11:22:12.088596] E [MSGID: 113001] [posix.c:4151:posix_getxattr] 0-wmrk-posix: listxattr failed on /rhs/brick5/1/brick1004/wmrk/.glusterfs/b4/c0/b4c0fba4-0031-4eef-bae5-2f970ff4e6cd [No such file or directory]
rhs-brick5-1-brick1004-wmrk.log:257:[2015-11-26 11:22:12.334106] E [MSGID: 113001] [posix.c:4101:posix_getxattr] 0-wmrk-posix: getxattr failed on /rhs/brick5/1/brick1004/wmrk/.glusterfs/b4/c0/b4c0fba4-0031-4eef-bae5-2f970ff4e6cd: trusted.glusterfs.dht.linkto  [No such file or directory]



It seems that the gfid link in .glusterfs did not exist at that point though it was present when I checked the brick.




2. GFIS mismatch errors and EINVAL:

[2015-11-27 09:38:20.179977] E [MSGID: 109037] [tier.c:420:tier_migrate_using_query_file] 0-wmrk-tier-dht: Failed to do lookup on file cfile1 [Invalid argument]
[2015-11-27 09:38:20.184138] E [MSGID: 109037] [tier.c:1479:tier_start] 0-wmrk-tier-dht: Promotion failed
The message "W [MSGID: 122019] [ec-helpers.c:348:ec_loc_gfid_check] 0-wmrk-disperse-0: Mismatching GFID's in loc" repeated 2 times between [2015-11-27 09:38:20.179889] and [2015-11-27 09:38:20.184004]
The message "E [MSGID: 109037] [tier.c:420:tier_migrate_using_query_file] 0-wmrk-tier-dht: Failed to do lookup on file cfile1
 [Invalid argument]" repeated 2 times between [2015-11-27 09:38:20.179977] and [2015-11-27 09:38:20.184030]
[2015-11-27 09:40:00.205299] W [MSGID: 122019] [ec-helpers.c:348:ec_loc_gfid_check] 0-wmrk-disperse-0: Mismatching GFID's in loc
[2015-11-27 09:40:00.205400] E [MSGID: 109037] [tier.c:420:tier_migrate_using_query_file] 0-wmrk-tier-dht: Failed to do lookup on file cfile1 [Invalid argument]

The .glusterfs link for cfile1 does not exist.

# file: brick1001/wmrk/cfile1
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x02000000000000005656aed2000acb50
trusted.ec.config=0x0000080602000200
trusted.ec.size=0x00000000002cac00
trusted.ec.version=0x000000000002ee61000000000002ee61
trusted.gfid=0x08434bcbd7a94946a28726b98dd763b3


[root@yarrow 84]# pwd
/rhs/brick5/1/brick1001/wmrk/.glusterfs/84
[root@yarrow 84]# ls
ac


There are a lot of null gfid errors in the brick logs as well.

rhs-brick5-1-brick1001-wmrk.log:311:[2015-11-27 07:41:57.733948] E [MSGID: 113091] [posix.c:156:posix_lookup] 0-wmrk-posix: null gfid for path (null)
rhs-brick5-1-brick1001-wmrk.log:312:[2015-11-27 07:41:57.733984] E [MSGID: 113018] [posix.c:174:posix_lookup] 0-wmrk-posix: lstat on null failed [Invalid argument]



Other errors seen are :
[2015-11-27 06:00:00.845488] E [MSGID: 101044] [ec.c:1041:ec_gf_fsetxattr] 0-wmrk-disperse-1: attempt to set internal xattr: trusted.ec.* [Operation not permitted]
[2015-11-27 06:00:00.845838] W [MSGID: 109023] [dht-rebalance.c:592:__dht_rebalance_create_dst_file] 0-wmrk-tier-dht: //nn: failed to set xattr on wmrk-cold-dht (Operation not permitted)


I think the EC folks might need to take a look at this.

Comment 5 Ashish Pandey 2015-11-30 05:34:11 UTC
Just started looking into this.
Observed one thing in the logs provided above.

Why are we getting error from a brick which is not involved in volume?

1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0
>>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==
Error: unable to open database "/rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db": unable to open database file

I don't see brick1009 in the given volume configuration -

Volume Name: wmrk
Type: Tier
Volume ID: 4013b73f-8739-48cd-9f19-f4a3b1228462
Status: Started
Number of Bricks: 14
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk
Brick2: zod:/rhs/brick5/1/brick1008/wmrk
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick3: zod:/rhs/brick5/1/brick1001/wmrk
Brick4: yarrow:/rhs/brick5/1/brick1001/wmrk
Brick5: zod:/rhs/brick5/1/brick1002/wmrk
Brick6: yarrow:/rhs/brick5/1/brick1002/wmrk
Brick7: zod:/rhs/brick5/1/brick1003/wmrk
Brick8: yarrow:/rhs/brick5/1/brick1003/wmrk
Brick9: zod:/rhs/brick5/1/brick1004/wmrk
Brick10: yarrow:/rhs/brick5/1/brick1004/wmrk
Brick11: zod:/rhs/brick5/1/brick1005/wmrk
Brick12: yarrow:/rhs/brick5/1/brick1005/wmrk
Brick13: zod:/rhs/brick5/1/brick1006/wmrk
Brick14: yarrow:/rhs/brick5/1/brick1006/wmrk


Investigating logs and trying to reproduce it on my system -

Comment 6 Ashish Pandey 2015-12-01 05:05:44 UTC
I am able to recreate this issue partially.

[2015-12-01 04:54:00.627216] E [MSGID: 101044] [ec.c:1043:ec_gf_fsetxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted]
[2015-12-01 04:54:00.627390] W [MSGID: 109023] [dht-rebalance.c:589:__dht_rebalance_create_dst_file] 0-vol-tier-dht: //file-8: failed to set xattr on vol-cold-dht (Operation not permitted)
[2015-12-01 04:54:15.281500] E [MSGID: 101044] [ec.c:1026:ec_gf_setxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted]
[2015-12-01 04:54:15.281671] W [MSGID: 109023] [dht-rebalance.c:1461:dht_migrate_file] 0-vol-tier-dht: Migrate file failed://file-8: failed to set xattr on vol-cold-dht (Operation not permitted)
[2015-12-01 04:54:15.407963] E [MSGID: 101044] [ec.c:1043:ec_gf_fsetxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted]
[2015-12-01 04:54:15.408123] W [MSGID: 109023] [dht-rebalance.c:589:__dht_rebalance_create_dst_file] 0-vol-tier-dht: //file-4: failed to set xattr on vol-cold-dht (Operation not permitted)
[2015-12-01 04:54:34.044056] E [MSGID: 101044] [ec.c:1026:ec_gf_setxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted]
[2015-12-01 04:54:34.044214] W [MSGID: 109023] [dht-rebalance.c:1461:dht_migrate_file] 0-vol-tier-dht: Migrate file failed://file-4: failed to set xattr on vol-cold-dht (Operation not permitted)
[2015-12-01 04:54:34.167878] E [MSGID: 101044] [ec.c:1043:ec_gf_fsetxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted]
[2015-12-01 04:54:34.168012] W [MSGID: 109023] [dht-rebalance.c:589:__dht_rebalance_create_dst_file] 0-vol-tier-dht: //file-10: failed to set xattr on vol-cold-dht (Operation not permitted)
[2015-12-01 04:54:51.907258] E [MSGID: 101044] [ec.c:1026:ec_gf_setxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted]


Followed the same steps but I am seeing all the files getting demoted.
Investigating it further and trying to find out why similar steps not giving same result.

Comment 7 Nithya Balachandran 2015-12-01 09:31:31 UTC
The log messages in comment#6 are due to the issue reported in 1286927.

Comment 8 Nag Pavan Chilakam 2015-12-01 10:04:22 UTC
Kindly have a look at the setup which is still available
the demoted attempts have increased drastically:
[root@zod ~]# gluster v tier wmrk status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            9                    0                    in progress         
10.70.34.43          0                    11648                in progress         
Tiering Migration Functionality: wmrk: success
[root@zod ~]#

Comment 9 Nag Pavan Chilakam 2015-12-03 09:07:30 UTC
Volume Name: wmrk
Type: Tier
Volume ID: 4013b73f-8739-48cd-9f19-f4a3b1228462
Status: Started
Number of Bricks: 14
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk
Brick2: zod:/rhs/brick5/1/brick1008/wmrk
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick3: zod:/rhs/brick5/1/brick1001/wmrk
Brick4: yarrow:/rhs/brick5/1/brick1001/wmrk
Brick5: zod:/rhs/brick5/1/brick1002/wmrk
Brick6: yarrow:/rhs/brick5/1/brick1002/wmrk
Brick7: zod:/rhs/brick5/1/brick1003/wmrk
Brick8: yarrow:/rhs/brick5/1/brick1003/wmrk
Brick9: zod:/rhs/brick5/1/brick1004/wmrk
Brick10: yarrow:/rhs/brick5/1/brick1004/wmrk
Brick11: zod:/rhs/brick5/1/brick1005/wmrk
Brick12: yarrow:/rhs/brick5/1/brick1005/wmrk
Brick13: zod:/rhs/brick5/1/brick1006/wmrk
Brick14: yarrow:/rhs/brick5/1/brick1006/wmrk
Options Reconfigured:
cluster.read-freq-threshold: 3
cluster.write-freq-threshold: 3
features.record-counters: on
cluster.tier-promote-frequency: 120
cluster.tier-demote-frequency: 300
cluster.watermark-hi: 3
cluster.watermark-low: 1
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on
[root@zod ~]# df -h /rhs/brick5/1/brick1001/wmrk*
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop10      20G  5.4G   14G  29% /rhs/brick5/1/brick1001
[root@zod ~]# df -h /rhs/brick5/1/brick100*/wmrk*
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop10      20G  5.4G   14G  29% /rhs/brick5/1/brick1001
/dev/loop11      20G  4.3G   15G  23% /rhs/brick5/1/brick1002
/dev/loop12      20G  3.2G   16G  17% /rhs/brick5/1/brick1003
/dev/loop13      20G  4.0G   16G  21% /rhs/brick5/1/brick1004
/dev/loop14      20G  4.0G   16G  21% /rhs/brick5/1/brick1005
/dev/loop15      20G  4.0G   16G  21% /rhs/brick5/1/brick1006
/dev/loop17      20G  1.9G   18G  10% /rhs/brick5/1/brick1008
[root@zod ~]# 





Sosreports available @:rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/nchilaka/bug.1286218

Comment 10 Nithya Balachandran 2015-12-04 12:42:27 UTC
Analysis of why demotion fails consistently for AP.BH.avi:

On checking the bricks:

On yarrow (similar steps return similar results on zod):
-------------------------------------------
[root@yarrow 1]# l brick100*/wmrk/AP*
---------T. 2 root root         0 Dec  4 13:16 brick1001/wmrk/AP.BH.avi
---------T. 2 root root         0 Dec  4 13:16 brick1002/wmrk/AP.BH.avi
---------T. 2 root root         0 Dec  4 13:16 brick1003/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 brick1004/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 brick1005/wmrk/AP.BH.avi
---------T. 2 root root         0 Nov 27 11:34 brick1006/wmrk/AP.BH.avi
-rw-r--r--. 2 root root 733861888 Nov 27 11:32 brick1008/wmrk/AP.BH.avi  <-- Hot tier brick



[root@yarrow 1]# getfattr -e hex -m . -d  brick100*/wmrk/AP*
# file: brick1001/wmrk/AP.BH.avi
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x02000000000000005656aed2000acb50
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000068bf55a00000000068bf55b
trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971
trusted.tier.tier-dht.linkto=0x776d726b2d686f742d64687400  

# file: brick1002/wmrk/AP.BH.avi
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x02000000000000005656aed2000d8f47
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000068bf55a00000000068bf55b
trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971
trusted.tier.tier-dht.linkto=0x776d726b2d686f742d64687400

# file: brick1003/wmrk/AP.BH.avi
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x02000000000000005656aed3000147f3
trusted.ec.config=0x0000080602000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000068bf55a00000000068bf55b
trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971
trusted.tier.tier-dht.linkto=0x776d726b2d686f742d64687400

# file: brick1004/wmrk/AP.BH.avi
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000000000000000000000000001
trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971
trusted.glusterfs.dht.linkto=0x776d726b2d64697370657273652d3000

# file: brick1005/wmrk/AP.BH.avi
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000000000000000000000000001
trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971
trusted.glusterfs.dht.linkto=0x776d726b2d64697370657273652d3000

# file: brick1006/wmrk/AP.BH.avi
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.config=0x0000080602000200
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000000000000000000000000001
trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971
trusted.glusterfs.dht.linkto=0x776d726b2d64697370657273652d3000

# file: brick1008/wmrk/AP.BH.avi
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005656dab7000b70a5
trusted.ec.config=0x0000080602000200
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000000000000000000000000000
trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971


brick100{1..3} are part of disperse-0, brick100{4..6} are part of disperse-1.
The file on disperse-1 is a dht linkto file, and that on disperse-0 is a tier linkto file.

disperse-1 is the hashed subvol for AP.BH.avi in the cold-dht layer.

In __dht_rebalance_create_dst_file:

        ret = syncop_create (to, loc, O_RDWR, DHT_LINKFILE_MODE, fd,
                             &new_stbuf, dict, NULL);


finds a file called AP.BH.avi on disperse-1 and opens an fd on it, even though it is not a tier linkto file.

In the same function, a lookup is sent on the cold tier which correctly sets the cached-subvol to disperse-0.

So now we have the cached subvol set to disperse-0 but the fd on the supposedly same file has been opened on disperse-1.

On calling syncop_fsetxattr:
        ret = syncop_fsetxattr (to, fd, xattr, 0, NULL, NULL);


dht_setfattr winds the call to the cached subvol (disperse-0). As the fd has not been opened on this subvol an anonymous fd is sent instead and the call fails.

Comment 12 Nithya Balachandran 2015-12-07 04:33:14 UTC
Vivek, this issue is unrelated to 1286927.

Comment 13 Nag Pavan Chilakam 2015-12-11 05:58:09 UTC
Not able to reproduce this 3.7.5-9. Closing this as not seen, but note that this can be a "POTENTIAL RISK" for the release

Comment 16 Mohammed Rafi KC 2015-12-21 13:50:29 UTC
Steps to reproduce;

Create the file in a tiered volume (2*2 in both tier),

Let the file demote,
rename the file so that hash range fall into the other 
promote the file.
wait to demote --- should successful demote

Comment 19 Sweta Anandpara 2015-12-24 13:09:55 UTC
Reproduced this using the steps mentioned in comment 16, on the build glusterfs-3.7.5-12 
And tested it again on the build glusterfs-3.7.5-13

Created a file on a tiered volume, waited for it to get demoted, renamed it so that it shifts to the other subvolume, accessed the file again so that the file gets promoted and waited for it to get demoted again. 

File demotion did *not happen in the build 3.7.5-12 and it did happen on the build 3.7.5-13. Detailed logs(of the steps executed on both the builds) are attached. 

Note: I did not see any errors in /var/log/glusterfs/<volname>-tier.log as mentioned in the description. Tried out a couple of other combinations as well, but to no avail. I would not be confidently able to say this bug as verified. 

Would the above steps suffice, for successful verification? Would you be able to suggest/guide scenarios which would make me hit the above errors, in the logs?

Comment 20 Sweta Anandpara 2015-12-24 13:13:33 UTC
Created attachment 1109203 [details]
Bug reproduced

Comment 21 Sweta Anandpara 2015-12-24 13:14:13 UTC
Created attachment 1109204 [details]
Bug fixed

Comment 22 Mohammed Rafi KC 2015-12-24 14:53:20 UTC
Couple of suggestions to reproduce the exact error message.

create a distributed-disperse on cold tier. second one is instead of renaming try to reproduce the same scenario with min_free_disk. When min_free_disk reaches on hashed subvol subvol, it create the data file on another subvol.


I cannot say the direct step to reproduce the exact error message confidently, the bug was filed couple of weeks before, and a lot of fixes went to solve other problems. That might have fixed some  of the partial issues. That could be the reason why you are not able to see the exact error message in the preceding build.

I haven't had looked into the logs or error messages, the ultimate problem was about demotion failure, and RCA for both problems were same. That's is why I reopened this bug and fixed it.

Comment 23 Sweta Anandpara 2015-12-28 12:41:16 UTC
Verified this again on the build glusterfs-3.7.5-13.el7rhgs.x86_64

Followed the steps as written in comment 22, did not see any errors/warnings related to the bug above in the tier-logs.

Recreated the setup as written in the description above and followed the steps mentioned there. Did not hit any error there as well. 

Moving this bug to verified in 3.1.2, with the validation done as per comment 19 and 23, and with the dev input as per comment 16 and 22.

Please find the logs attached.

Comment 24 Sweta Anandpara 2015-12-28 12:41:49 UTC
Created attachment 1110005 [details]
Server and client logs

Comment 26 errata-xmlrpc 2016-03-01 06:03:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.