Description of problem: ===================== When a file is getting promoted to hot tier, but fails to get promoted due to disk being full, the promote fails with disk space not available. But as part of migration we create a link file in hot tier and this is not getting cleanedup. Version-Release number of selected component (if applicable): =[======================================================== [root@zod ~]# rpm -qa|grep gluster glusterfs-server-3.7.5-7.el7rhgs.x86_64 glusterfs-libs-3.7.5-7.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-7.el7rhgs.x86_64 glusterfs-fuse-3.7.5-7.el7rhgs.x86_64 glusterfs-devel-3.7.5-7.el7rhgs.x86_64 glusterfs-api-3.7.5-7.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-7.el7rhgs.x86_64 glusterfs-3.7.5-7.el7rhgs.x86_64 glusterfs-cli-3.7.5-7.el7rhgs.x86_64 [root@zod ~]# Steps to Reproduce: 1.create a regular volume with two 1 gb files 2.now attach a tier with max of 1.1 gb 3. Now turn of cache mode 4, append both the files to see them being promoted next cycle 5. in next cycle one gets promoted, but other fails as below [2015-12-02 05:56:01.006402] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-tfile-tier-dht: data movement attempted from node (tfile-cold-dht) to node (tfile-hot-dht) which does not have required free space for (//trans.avi) 6. But it leaves a link file in hot tier Implications: ============= 1)unncessary inodes being used up 2)the whole point of having cold tier as hashed subvol will be in a way defeated. =======In below case I was trying to promote trans.avi========= [root@yarrow ~]# ll /*/brick*/tfile* /dummy/brick108/tfile_hot: total 4 ---------T. 2 root root 0 Dec 2 11:26 trans.avi /rhs/brick1/tfile: total 1899192 -rw-r--r--. 2 root root 743094942 Dec 2 11:21 hot700.avi -rw-r--r--. 2 root root 1201674624 Dec 2 11:24 trans.avi /rhs/brick2/tfile: total 725680 ---------T. 2 root root 0 Dec 2 11:24 mosa700.avi [root@yarrow ~]# [root@yarrow ~]# [root@yarrow ~]# ll /*/brick*/tfile* /dummy/brick108/tfile_hot: total 4 ---------T. 2 root root 0 Dec 2 11:26 trans.avi /rhs/brick1/tfile: total 1899192 -rw-r--r--. 2 root root 743094942 Dec 2 11:21 hot700.avi -rw-r--r--. 2 root root 1201674624 Dec 2 11:24 trans.avi /rhs/brick2/tfile: total 0 ---------T. 2 root root 0 Dec 2 11:26 mosa700.avi [root@yarrow ~]# [root@zod ~]# gluster v info tfile Volume Name: tfile Type: Tier Volume ID: 664b21d8-f295-4708-90d6-7b1b2195654b Status: Started Number of Bricks: 6 Transport-type: tcp Hot Tier : Hot Tier Type : Distribute Number of Bricks: 2 Brick1: zod:/dummy/brick108/tfile_hot Brick2: yarrow:/dummy/brick108/tfile_hot Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick3: zod:/rhs/brick1/tfile Brick4: yarrow:/rhs/brick1/tfile Brick5: zod:/rhs/brick2/tfile Brick6: yarrow:/rhs/brick2/tfile Options Reconfigured: cluster.tier-mode: test features.ctr-enabled: on performance.readdir-ahead: on [root@zod ~]# rpm -qa|grep gluster glusterfs-server-3.7.5-7.el7rhgs.x86_64 glusterfs-libs-3.7.5-7.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-7.el7rhgs.x86_64 glusterfs-fuse-3.7.5-7.el7rhgs.x86_64 glusterfs-devel-3.7.5-7.el7rhgs.x86_64 glusterfs-api-3.7.5-7.el7rhgs.x86_64 glusterfs-debuginfo-3.7.5-7.el7rhgs.x86_64 glusterfs-3.7.5-7.el7rhgs.x86_64 glusterfs-cli-3.7.5-7.el7rhgs.x86_64 [root@zod ~]#
Also, if we delete such a file and then try to create a new file with same name we get following error on mount : [root@rhs-client1 cola]# touch fil1 touch: setting times of ‘fil1’: Stale file handle [root@rhs-client1 cola]# [root@rhs-client1 cola]# [root@rhs-client1 cola]# touch fil1 touch: setting times of ‘fil1’: Stale file handle [root@rhs-client1 cola]# [root@rhs-client1 cola]# touch fil1 touch: setting times of ‘fil1’: Stale file handle mount log: [root@rhs-client1 cola]# tail /var/log/glusterfs/mnt-cola.log [2015-12-18 17:56:02.729983] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1 [2015-12-18 17:56:02.731658] I [MSGID: 109070] [dht-common.c:1865:dht_lookup_linkfile_cbk] 2-cola-tier-dht: lookup of /fil1 on cola-hot-dht (following linkfile) reached link,gfid = 00000000-0000-0000-0000-000000000000 [2015-12-18 17:56:02.732169] W [MSGID: 109009] [dht-common.c:1619:dht_lookup_everywhere_cbk] 2-cola-tier-dht: /fil1: gfid differs on subvolume cola-hot-dht, gfid local = 7a9ba3e2-72f2-45c0-ae60-76637e4169a5, gfid node = 32c02986-be79-42b2-8fac-f29abb356856 The message "W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-4: remote operation failed [Device or resource busy]" repeated 2 times between [2015-12-18 17:56:02.725145] and [2015-12-18 17:56:02.732609] [2015-12-18 17:56:02.732612] I [MSGID: 109069] [dht-common.c:1069:dht_lookup_unlink_cbk] 2-cola-tier-dht: lookup_unlink returned with op_ret -> -1 and op-errno -> 16 for /fil1 [2015-12-18 17:56:02.733384] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-1: remote operation failed [Device or resource busy] [2015-12-18 17:56:02.733502] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-0: remote operation failed [Device or resource busy] [2015-12-18 17:56:02.733820] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1 [2015-12-18 17:56:02.741391] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-cola-tier-dht: Found anomalies in /fil1 (gfid = 32c02986-be79-42b2-8fac-f29abb356856). Holes=1 overlaps=0 [2015-12-18 17:56:02.741522] W [fuse-bridge.c:1058:fuse_setattr_cbk] 0-glusterfs-fuse: 2884793: SETATTR() /fil1 => -1 (Stale file handle) after some time i was able to create the file but saw following error in mount log [root@rhs-client1 cola]# tail /var/log/glusterfs/mnt-cola.log [2015-12-18 17:56:02.732169] W [MSGID: 109009] [dht-common.c:1619:dht_lookup_everywhere_cbk] 2-cola-tier-dht: /fil1: gfid differs on subvolume cola-hot-dht, gfid local = 7a9ba3e2-72f2-45c0-ae60-76637e4169a5, gfid node = 32c02986-be79-42b2-8fac-f29abb356856 The message "W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-4: remote operation failed [Device or resource busy]" repeated 2 times between [2015-12-18 17:56:02.725145] and [2015-12-18 17:56:02.732609] [2015-12-18 17:56:02.732612] I [MSGID: 109069] [dht-common.c:1069:dht_lookup_unlink_cbk] 2-cola-tier-dht: lookup_unlink returned with op_ret -> -1 and op-errno -> 16 for /fil1 [2015-12-18 17:56:02.733384] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-1: remote operation failed [Device or resource busy] [2015-12-18 17:56:02.733502] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-0: remote operation failed [Device or resource busy] [2015-12-18 17:56:02.733820] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1 [2015-12-18 17:56:02.741391] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-cola-tier-dht: Found anomalies in /fil1 (gfid = 32c02986-be79-42b2-8fac-f29abb356856). Holes=1 overlaps=0 [2015-12-18 17:56:02.741522] W [fuse-bridge.c:1058:fuse_setattr_cbk] 0-glusterfs-fuse: 2884793: SETATTR() /fil1 => -1 (Stale file handle) [2015-12-18 17:56:02.732187] I [MSGID: 109045] [dht-common.c:1744:dht_lookup_everywhere_cbk] 2-cola-tier-dht: attempting deletion of stale linkfile /fil1 on cola-hot-dht (hashed subvol is cola-cold-dht) [2015-12-18 17:56:02.741502] E [MSGID: 109040] [dht-helper.c:1020:dht_migration_complete_check_task] 2-cola-tier-dht: /fil1: failed to lookup the file on cola-tier-dht [Stale file handle] [root@rhs-client1 cola]#
As tier is not being actively developed, I'm closing this bug. Feel free to open it if necessary.