Description of problem: ====================== With watermarks enabled, I was playing with the hi and low watermarks. FOllowing is my configuration: Cold tier: Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk Brick2: zod:/rhs/brick5/1/brick1008/wmrk Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Volume size before and after atttach tier: Before: [root@localhost wmrk]# df -h ../wmrk Filesystem Size Used Avail Use% Mounted on zod:wmrk 153G 4.3G 149G 3% /mnt/wmrk After: [root@localhost wmrk]# df -h ../wmrk Filesystem Size Used Avail Use% Mounted on zod:wmrk 172G 4.3G 168G 3% /mnt/wmrk That means my hot tier is about 20GB size Now I set the cold and hot to below: cluster.watermark-hi: 99 cluster.watermark-low: 75 created some 10 files of about 700MB each and then when they were left idle moved the watermarks to below: cluster.watermark-hi: 5 cluster.watermark-low: 1 which means, that the if there is more than 1GB of data in hot tier(ie Hi), then only demotes will happen. Now hence one file(AP.BH.avi) with 700MB was remaining in hot tier, even though it is idle. Now i created a 1GB file which went to hot tier and hence this old file started to demote, but then failed after some time and the new file got demoted. Version-Release number of selected component (if applicable): ======================= 3.7.5-7 Volume Name: wmrk Type: Tier Volume ID: 4013b73f-8739-48cd-9f19-f4a3b1228462 Status: Started Number of Bricks: 14 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk Brick2: zod:/rhs/brick5/1/brick1008/wmrk Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick3: zod:/rhs/brick5/1/brick1001/wmrk Brick4: yarrow:/rhs/brick5/1/brick1001/wmrk Brick5: zod:/rhs/brick5/1/brick1002/wmrk Brick6: yarrow:/rhs/brick5/1/brick1002/wmrk Brick7: zod:/rhs/brick5/1/brick1003/wmrk Brick8: yarrow:/rhs/brick5/1/brick1003/wmrk Brick9: zod:/rhs/brick5/1/brick1004/wmrk Brick10: yarrow:/rhs/brick5/1/brick1004/wmrk Brick11: zod:/rhs/brick5/1/brick1005/wmrk Brick12: yarrow:/rhs/brick5/1/brick1005/wmrk Brick13: zod:/rhs/brick5/1/brick1006/wmrk Brick14: yarrow:/rhs/brick5/1/brick1006/wmrk Options Reconfigured: cluster.read-freq-threshold: 3 cluster.write-freq-threshold: 3 features.record-counters: on cluster.tier-promote-frequency: 120 cluster.tier-demote-frequency: 300 cluster.watermark-hi: 3 cluster.watermark-low: 1 cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on
not permitted) [2015-11-27 13:04:00.442970] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-2: remote operation failed [No such file or directory] [2015-11-27 13:04:00.442992] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-0: remote operation failed [No such file or directory] [2015-11-27 13:04:00.442989] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-4: remote operation failed [No such file or directory] [2015-11-27 13:04:00.445067] E [MSGID: 109023] [dht-rebalance.c:608:__dht_rebalance_create_dst_file] 0-wmrk-tier-dht: chown failed for //AP.BH.avi on wmrk-cold-dht (No such file or directory) The message "W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-1: remote operation failed [No such file or directory]" repeated 2 times between [2015-11-27 13:03:50.355235] and [2015-11-27 13:04:19.948897] [2015-11-27 13:04:19.948925] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-2: remote operation failed [No such file or directory] The message "W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-5: remote operation failed [No such file or directory]" repeated 2 times between [2015-11-27 13:03:50.355332] and [2015-11-27 13:04:19.949000] [2015-11-27 13:04:19.949018] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-4: remote operation failed [No such file or directory] [2015-11-27 13:04:19.949062] W [MSGID: 114031] [client-rpc-fops.c:2039:client3_3_fsetattr_cbk] 0-wmrk-client-0: remote operation failed [No such file or directory] [2015-11-27 13:04:19.951061] W [MSGID: 109023] [dht-rebalance.c:1418:dht_migrate_file] 0-wmrk-tier-dht: Migrate file failed://AP.BH.avi: failed to perform setattr on wmrk-cold-dht [No such file or directory]
Below, is the log of db, where file starts to migrate and then fails: [root@zod ~]# [root@zod ~]# echo "===========Date=====================";date; echo "=============ColdBrick#1 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1001/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#2 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1002/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#3 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1003/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#4 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1004/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#5 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1005/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#6 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1006/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo ">>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1008/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo ">>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;ll /rhs/brick5/1/brick100*/wmrk/AP*;du -sh /rhs/brick5/1/brick100*/wmrk ===========Date===================== Fri Nov 27 18:48:17 IST 2015 =============ColdBrick#1 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#2 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#3 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#4 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#5 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#6 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 >>>>>>>>>>>> HOTBRICK#1 <<<<<<<<== bb063a50-1d92-4f28-955e-1d3c5f1a1971|1448604159|503768|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 >>>>>>>>>>>> HOTBRICK#2 <<<<<<<<== Error: unable to open database "/rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db": unable to open database file ---------T. 2 root root 183465472 Nov 27 18:48 /rhs/brick5/1/brick1001/wmrk/AP.BH.avi ---------T. 2 root root 183465472 Nov 27 18:48 /rhs/brick5/1/brick1002/wmrk/AP.BH.avi ---------T. 2 root root 183465472 Nov 27 18:48 /rhs/brick5/1/brick1003/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 /rhs/brick5/1/brick1004/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 /rhs/brick5/1/brick1005/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 /rhs/brick5/1/brick1006/wmrk/AP.BH.avi -rw-r-Sr-T. 2 root root 733861888 Nov 27 11:32 /rhs/brick5/1/brick1008/wmrk/AP.BH.avi 3.3G /rhs/brick5/1/brick1001/wmrk 3.3G /rhs/brick5/1/brick1002/wmrk 3.3G /rhs/brick5/1/brick1003/wmrk 3.9G /rhs/brick5/1/brick1004/wmrk 3.9G /rhs/brick5/1/brick1005/wmrk 3.9G /rhs/brick5/1/brick1006/wmrk 704M /rhs/brick5/1/brick1008/wmrk [root@zod ~]# echo "===========Date=====================";date; echo "=============ColdBrick#1 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1001/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#2 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1002/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo "=============ColdBrick#3 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1003/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#4 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1004/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#5 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1005/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo "=============ColdBrick#6 =========" ; echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1006/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971; echo ">>>>>>>>>>>> HOTBRICK#1 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1008/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;echo ">>>>>>>>>>>> HOTBRICK#2 <<<<<<<<==";echo "select * from gf_file_tb; select * from gf_flink_tb;" | sqlite3 /rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db|grep bb063a50-1d92-4f28-955e-1d3c5f1a1971;ll /rhs/brick5/1/brick100*/wmrk/AP*;du -sh /rhs/brick5/1/brick100*/wmrk ===========Date===================== Fri Nov 27 18:48:19 IST 2015 =============ColdBrick#1 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#2 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#3 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#4 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#5 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 =============ColdBrick#6 ========= bb063a50-1d92-4f28-955e-1d3c5f1a1971|0|0|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 >>>>>>>>>>>> HOTBRICK#1 <<<<<<<<== bb063a50-1d92-4f28-955e-1d3c5f1a1971|1448604159|503768|0|0|0|0|0|0|0|0 bb063a50-1d92-4f28-955e-1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 >>>>>>>>>>>> HOTBRICK#2 <<<<<<<<== Error: unable to open database "/rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db": unable to open database file ---------T. 2 root root 0 Nov 27 18:48 /rhs/brick5/1/brick1001/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 18:48 /rhs/brick5/1/brick1002/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 18:48 /rhs/brick5/1/brick1003/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 /rhs/brick5/1/brick1004/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 /rhs/brick5/1/brick1005/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 /rhs/brick5/1/brick1006/wmrk/AP.BH.avi -rw-r--r--. 2 root root 733861888 Nov 27 11:32 /rhs/brick5/1/brick1008/wmrk/AP.BH.avi 3.1G /rhs/brick5/1/brick1001/wmrk 3.1G /rhs/brick5/1/brick1002/wmrk 3.1G /rhs/brick5/1/brick1003/wmrk 3.9G /rhs/brick5/1/brick1004/wmrk 3.9G /rhs/brick5/1/brick1005/wmrk 3.9G /rhs/brick5/1/brick1006/wmrk 704M /rhs/brick5/1/brick1008/wmrk [root@zod ~]# This cycle goes on indefinitely
There are several different errors in the log files: 1. No such file or directory: [2015-11-26 11:22:12.336273] E [MSGID: 109040] [dht-helper.c:1020:dht_migration_complete_check_task] 0-wmrk-cold-dht: //fiftygb: failed to lookup the file on wmrk-cold-dht [No such file or directory] [2015-11-26 11:22:12.336618] W [MSGID: 109023] [dht-rebalance.c:1532:dht_migrate_file] 0-wmrk-tier-dht: //fiftygb: failed to do a stat on wmrk-cold-dht (No such file or directory) [2015-11-26 11:22:14.039588] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/disperse.so(ec_resume+0x88) [0x7f9c3ebed628] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/disperse.so(__ec_manager+0x5b) [0x7f9c3ebed47b] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/disperse.so(ec_manager_setattr+0x1f3) [0x7f9c3ec05593] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) From the posix logs: rhs-brick5-1-brick1004-wmrk.log:256:[2015-11-26 11:22:12.088596] E [MSGID: 113001] [posix.c:4151:posix_getxattr] 0-wmrk-posix: listxattr failed on /rhs/brick5/1/brick1004/wmrk/.glusterfs/b4/c0/b4c0fba4-0031-4eef-bae5-2f970ff4e6cd [No such file or directory] rhs-brick5-1-brick1004-wmrk.log:257:[2015-11-26 11:22:12.334106] E [MSGID: 113001] [posix.c:4101:posix_getxattr] 0-wmrk-posix: getxattr failed on /rhs/brick5/1/brick1004/wmrk/.glusterfs/b4/c0/b4c0fba4-0031-4eef-bae5-2f970ff4e6cd: trusted.glusterfs.dht.linkto [No such file or directory] It seems that the gfid link in .glusterfs did not exist at that point though it was present when I checked the brick. 2. GFIS mismatch errors and EINVAL: [2015-11-27 09:38:20.179977] E [MSGID: 109037] [tier.c:420:tier_migrate_using_query_file] 0-wmrk-tier-dht: Failed to do lookup on file cfile1 [Invalid argument] [2015-11-27 09:38:20.184138] E [MSGID: 109037] [tier.c:1479:tier_start] 0-wmrk-tier-dht: Promotion failed The message "W [MSGID: 122019] [ec-helpers.c:348:ec_loc_gfid_check] 0-wmrk-disperse-0: Mismatching GFID's in loc" repeated 2 times between [2015-11-27 09:38:20.179889] and [2015-11-27 09:38:20.184004] The message "E [MSGID: 109037] [tier.c:420:tier_migrate_using_query_file] 0-wmrk-tier-dht: Failed to do lookup on file cfile1 [Invalid argument]" repeated 2 times between [2015-11-27 09:38:20.179977] and [2015-11-27 09:38:20.184030] [2015-11-27 09:40:00.205299] W [MSGID: 122019] [ec-helpers.c:348:ec_loc_gfid_check] 0-wmrk-disperse-0: Mismatching GFID's in loc [2015-11-27 09:40:00.205400] E [MSGID: 109037] [tier.c:420:tier_migrate_using_query_file] 0-wmrk-tier-dht: Failed to do lookup on file cfile1 [Invalid argument] The .glusterfs link for cfile1 does not exist. # file: brick1001/wmrk/cfile1 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.bit-rot.version=0x02000000000000005656aed2000acb50 trusted.ec.config=0x0000080602000200 trusted.ec.size=0x00000000002cac00 trusted.ec.version=0x000000000002ee61000000000002ee61 trusted.gfid=0x08434bcbd7a94946a28726b98dd763b3 [root@yarrow 84]# pwd /rhs/brick5/1/brick1001/wmrk/.glusterfs/84 [root@yarrow 84]# ls ac There are a lot of null gfid errors in the brick logs as well. rhs-brick5-1-brick1001-wmrk.log:311:[2015-11-27 07:41:57.733948] E [MSGID: 113091] [posix.c:156:posix_lookup] 0-wmrk-posix: null gfid for path (null) rhs-brick5-1-brick1001-wmrk.log:312:[2015-11-27 07:41:57.733984] E [MSGID: 113018] [posix.c:174:posix_lookup] 0-wmrk-posix: lstat on null failed [Invalid argument] Other errors seen are : [2015-11-27 06:00:00.845488] E [MSGID: 101044] [ec.c:1041:ec_gf_fsetxattr] 0-wmrk-disperse-1: attempt to set internal xattr: trusted.ec.* [Operation not permitted] [2015-11-27 06:00:00.845838] W [MSGID: 109023] [dht-rebalance.c:592:__dht_rebalance_create_dst_file] 0-wmrk-tier-dht: //nn: failed to set xattr on wmrk-cold-dht (Operation not permitted) I think the EC folks might need to take a look at this.
Just started looking into this. Observed one thing in the logs provided above. Why are we getting error from a brick which is not involved in volume? 1d3c5f1a1971|00000000-0000-0000-0000-000000000001|AP.BH.avi|/AP.BH.avi|0|0 >>>>>>>>>>>> HOTBRICK#2 <<<<<<<<== Error: unable to open database "/rhs/brick5/1/brick1009/wmrk/.glusterfs/wmrk.db": unable to open database file I don't see brick1009 in the given volume configuration - Volume Name: wmrk Type: Tier Volume ID: 4013b73f-8739-48cd-9f19-f4a3b1228462 Status: Started Number of Bricks: 14 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk Brick2: zod:/rhs/brick5/1/brick1008/wmrk Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick3: zod:/rhs/brick5/1/brick1001/wmrk Brick4: yarrow:/rhs/brick5/1/brick1001/wmrk Brick5: zod:/rhs/brick5/1/brick1002/wmrk Brick6: yarrow:/rhs/brick5/1/brick1002/wmrk Brick7: zod:/rhs/brick5/1/brick1003/wmrk Brick8: yarrow:/rhs/brick5/1/brick1003/wmrk Brick9: zod:/rhs/brick5/1/brick1004/wmrk Brick10: yarrow:/rhs/brick5/1/brick1004/wmrk Brick11: zod:/rhs/brick5/1/brick1005/wmrk Brick12: yarrow:/rhs/brick5/1/brick1005/wmrk Brick13: zod:/rhs/brick5/1/brick1006/wmrk Brick14: yarrow:/rhs/brick5/1/brick1006/wmrk Investigating logs and trying to reproduce it on my system -
I am able to recreate this issue partially. [2015-12-01 04:54:00.627216] E [MSGID: 101044] [ec.c:1043:ec_gf_fsetxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted] [2015-12-01 04:54:00.627390] W [MSGID: 109023] [dht-rebalance.c:589:__dht_rebalance_create_dst_file] 0-vol-tier-dht: //file-8: failed to set xattr on vol-cold-dht (Operation not permitted) [2015-12-01 04:54:15.281500] E [MSGID: 101044] [ec.c:1026:ec_gf_setxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted] [2015-12-01 04:54:15.281671] W [MSGID: 109023] [dht-rebalance.c:1461:dht_migrate_file] 0-vol-tier-dht: Migrate file failed://file-8: failed to set xattr on vol-cold-dht (Operation not permitted) [2015-12-01 04:54:15.407963] E [MSGID: 101044] [ec.c:1043:ec_gf_fsetxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted] [2015-12-01 04:54:15.408123] W [MSGID: 109023] [dht-rebalance.c:589:__dht_rebalance_create_dst_file] 0-vol-tier-dht: //file-4: failed to set xattr on vol-cold-dht (Operation not permitted) [2015-12-01 04:54:34.044056] E [MSGID: 101044] [ec.c:1026:ec_gf_setxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted] [2015-12-01 04:54:34.044214] W [MSGID: 109023] [dht-rebalance.c:1461:dht_migrate_file] 0-vol-tier-dht: Migrate file failed://file-4: failed to set xattr on vol-cold-dht (Operation not permitted) [2015-12-01 04:54:34.167878] E [MSGID: 101044] [ec.c:1043:ec_gf_fsetxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted] [2015-12-01 04:54:34.168012] W [MSGID: 109023] [dht-rebalance.c:589:__dht_rebalance_create_dst_file] 0-vol-tier-dht: //file-10: failed to set xattr on vol-cold-dht (Operation not permitted) [2015-12-01 04:54:51.907258] E [MSGID: 101044] [ec.c:1026:ec_gf_setxattr] 0-vol-disperse-0: attempt to set internal xattr: trusted.ec.* [Operation not permitted] Followed the same steps but I am seeing all the files getting demoted. Investigating it further and trying to find out why similar steps not giving same result.
The log messages in comment#6 are due to the issue reported in 1286927.
Kindly have a look at the setup which is still available the demoted attempts have increased drastically: [root@zod ~]# gluster v tier wmrk status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 9 0 in progress 10.70.34.43 0 11648 in progress Tiering Migration Functionality: wmrk: success [root@zod ~]#
Volume Name: wmrk Type: Tier Volume ID: 4013b73f-8739-48cd-9f19-f4a3b1228462 Status: Started Number of Bricks: 14 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: yarrow:/rhs/brick5/1/brick1008/wmrk Brick2: zod:/rhs/brick5/1/brick1008/wmrk Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick3: zod:/rhs/brick5/1/brick1001/wmrk Brick4: yarrow:/rhs/brick5/1/brick1001/wmrk Brick5: zod:/rhs/brick5/1/brick1002/wmrk Brick6: yarrow:/rhs/brick5/1/brick1002/wmrk Brick7: zod:/rhs/brick5/1/brick1003/wmrk Brick8: yarrow:/rhs/brick5/1/brick1003/wmrk Brick9: zod:/rhs/brick5/1/brick1004/wmrk Brick10: yarrow:/rhs/brick5/1/brick1004/wmrk Brick11: zod:/rhs/brick5/1/brick1005/wmrk Brick12: yarrow:/rhs/brick5/1/brick1005/wmrk Brick13: zod:/rhs/brick5/1/brick1006/wmrk Brick14: yarrow:/rhs/brick5/1/brick1006/wmrk Options Reconfigured: cluster.read-freq-threshold: 3 cluster.write-freq-threshold: 3 features.record-counters: on cluster.tier-promote-frequency: 120 cluster.tier-demote-frequency: 300 cluster.watermark-hi: 3 cluster.watermark-low: 1 cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on [root@zod ~]# df -h /rhs/brick5/1/brick1001/wmrk* Filesystem Size Used Avail Use% Mounted on /dev/loop10 20G 5.4G 14G 29% /rhs/brick5/1/brick1001 [root@zod ~]# df -h /rhs/brick5/1/brick100*/wmrk* Filesystem Size Used Avail Use% Mounted on /dev/loop10 20G 5.4G 14G 29% /rhs/brick5/1/brick1001 /dev/loop11 20G 4.3G 15G 23% /rhs/brick5/1/brick1002 /dev/loop12 20G 3.2G 16G 17% /rhs/brick5/1/brick1003 /dev/loop13 20G 4.0G 16G 21% /rhs/brick5/1/brick1004 /dev/loop14 20G 4.0G 16G 21% /rhs/brick5/1/brick1005 /dev/loop15 20G 4.0G 16G 21% /rhs/brick5/1/brick1006 /dev/loop17 20G 1.9G 18G 10% /rhs/brick5/1/brick1008 [root@zod ~]# Sosreports available @:rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/nchilaka/bug.1286218
Analysis of why demotion fails consistently for AP.BH.avi: On checking the bricks: On yarrow (similar steps return similar results on zod): ------------------------------------------- [root@yarrow 1]# l brick100*/wmrk/AP* ---------T. 2 root root 0 Dec 4 13:16 brick1001/wmrk/AP.BH.avi ---------T. 2 root root 0 Dec 4 13:16 brick1002/wmrk/AP.BH.avi ---------T. 2 root root 0 Dec 4 13:16 brick1003/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 brick1004/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 brick1005/wmrk/AP.BH.avi ---------T. 2 root root 0 Nov 27 11:34 brick1006/wmrk/AP.BH.avi -rw-r--r--. 2 root root 733861888 Nov 27 11:32 brick1008/wmrk/AP.BH.avi <-- Hot tier brick [root@yarrow 1]# getfattr -e hex -m . -d brick100*/wmrk/AP* # file: brick1001/wmrk/AP.BH.avi security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.bit-rot.version=0x02000000000000005656aed2000acb50 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000068bf55a00000000068bf55b trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971 trusted.tier.tier-dht.linkto=0x776d726b2d686f742d64687400 # file: brick1002/wmrk/AP.BH.avi security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.bit-rot.version=0x02000000000000005656aed2000d8f47 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000068bf55a00000000068bf55b trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971 trusted.tier.tier-dht.linkto=0x776d726b2d686f742d64687400 # file: brick1003/wmrk/AP.BH.avi security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.bit-rot.version=0x02000000000000005656aed3000147f3 trusted.ec.config=0x0000080602000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000068bf55a00000000068bf55b trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971 trusted.tier.tier-dht.linkto=0x776d726b2d686f742d64687400 # file: brick1004/wmrk/AP.BH.avi security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000000000000000000000000001 trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971 trusted.glusterfs.dht.linkto=0x776d726b2d64697370657273652d3000 # file: brick1005/wmrk/AP.BH.avi security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000000000000000000000000001 trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971 trusted.glusterfs.dht.linkto=0x776d726b2d64697370657273652d3000 # file: brick1006/wmrk/AP.BH.avi security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.ec.config=0x0000080602000200 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000000000000000000000000001 trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971 trusted.glusterfs.dht.linkto=0x776d726b2d64697370657273652d3000 # file: brick1008/wmrk/AP.BH.avi security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x02000000000000005656dab7000b70a5 trusted.ec.config=0x0000080602000200 trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000000000000000000000000000 trusted.gfid=0xbb063a501d924f28955e1d3c5f1a1971 brick100{1..3} are part of disperse-0, brick100{4..6} are part of disperse-1. The file on disperse-1 is a dht linkto file, and that on disperse-0 is a tier linkto file. disperse-1 is the hashed subvol for AP.BH.avi in the cold-dht layer. In __dht_rebalance_create_dst_file: ret = syncop_create (to, loc, O_RDWR, DHT_LINKFILE_MODE, fd, &new_stbuf, dict, NULL); finds a file called AP.BH.avi on disperse-1 and opens an fd on it, even though it is not a tier linkto file. In the same function, a lookup is sent on the cold tier which correctly sets the cached-subvol to disperse-0. So now we have the cached subvol set to disperse-0 but the fd on the supposedly same file has been opened on disperse-1. On calling syncop_fsetxattr: ret = syncop_fsetxattr (to, fd, xattr, 0, NULL, NULL); dht_setfattr winds the call to the cached subvol (disperse-0). As the fd has not been opened on this subvol an anonymous fd is sent instead and the call fails.
Vivek, this issue is unrelated to 1286927.
Not able to reproduce this 3.7.5-9. Closing this as not seen, but note that this can be a "POTENTIAL RISK" for the release
Steps to reproduce; Create the file in a tiered volume (2*2 in both tier), Let the file demote, rename the file so that hash range fall into the other promote the file. wait to demote --- should successful demote
Reproduced this using the steps mentioned in comment 16, on the build glusterfs-3.7.5-12 And tested it again on the build glusterfs-3.7.5-13 Created a file on a tiered volume, waited for it to get demoted, renamed it so that it shifts to the other subvolume, accessed the file again so that the file gets promoted and waited for it to get demoted again. File demotion did *not happen in the build 3.7.5-12 and it did happen on the build 3.7.5-13. Detailed logs(of the steps executed on both the builds) are attached. Note: I did not see any errors in /var/log/glusterfs/<volname>-tier.log as mentioned in the description. Tried out a couple of other combinations as well, but to no avail. I would not be confidently able to say this bug as verified. Would the above steps suffice, for successful verification? Would you be able to suggest/guide scenarios which would make me hit the above errors, in the logs?
Created attachment 1109203 [details] Bug reproduced
Created attachment 1109204 [details] Bug fixed
Couple of suggestions to reproduce the exact error message. create a distributed-disperse on cold tier. second one is instead of renaming try to reproduce the same scenario with min_free_disk. When min_free_disk reaches on hashed subvol subvol, it create the data file on another subvol. I cannot say the direct step to reproduce the exact error message confidently, the bug was filed couple of weeks before, and a lot of fixes went to solve other problems. That might have fixed some of the partial issues. That could be the reason why you are not able to see the exact error message in the preceding build. I haven't had looked into the logs or error messages, the ultimate problem was about demotion failure, and RCA for both problems were same. That's is why I reopened this bug and fixed it.
Verified this again on the build glusterfs-3.7.5-13.el7rhgs.x86_64 Followed the steps as written in comment 22, did not see any errors/warnings related to the bug above in the tier-logs. Recreated the setup as written in the description above and followed the steps mentioned there. Did not hit any error there as well. Moving this bug to verified in 3.1.2, with the validation done as per comment 19 and 23, and with the dev input as per comment 16 and 22. Please find the logs attached.
Created attachment 1110005 [details] Server and client logs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html