Bug 1287436

Summary: Link files being created(and not cleaned up), if hot tier is full during migration
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: tierAssignee: hari gowtham <hgowtham>
Status: CLOSED WONTFIX QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: low Docs Contact:
Priority: low    
Version: rhgs-3.1CC: rhs-bugs
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: tier-migration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-08 19:19:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2015-12-02 06:03:31 UTC
Description of problem:
=====================
When a file is getting promoted to hot tier, but fails to get promoted due to disk being full, the promote fails with disk space not available.
But as part of migration we create a link file in hot tier and this is not getting cleanedup.

Version-Release number of selected component (if applicable):
=[========================================================
[root@zod ~]# rpm -qa|grep gluster
glusterfs-server-3.7.5-7.el7rhgs.x86_64
glusterfs-libs-3.7.5-7.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-7.el7rhgs.x86_64
glusterfs-fuse-3.7.5-7.el7rhgs.x86_64
glusterfs-devel-3.7.5-7.el7rhgs.x86_64
glusterfs-api-3.7.5-7.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-7.el7rhgs.x86_64
glusterfs-3.7.5-7.el7rhgs.x86_64
glusterfs-cli-3.7.5-7.el7rhgs.x86_64
[root@zod ~]# 



Steps to Reproduce:
1.create a regular volume with two 1 gb files
2.now attach a tier with max of 1.1 gb
3. Now turn of cache mode 
4, append both the files to see them being promoted next cycle
5. in next cycle one gets promoted, but other fails as below
[2015-12-02 05:56:01.006402] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-tfile-tier-dht: data movement attempted from node (tfile-cold-dht) to node (tfile-hot-dht) which does not have required free space for (//trans.avi)


6. But it leaves a link file in hot tier 


Implications:
=============
1)unncessary inodes being used up
2)the whole point of having cold tier as hashed subvol will be in a way defeated.





=======In below case I was trying to promote trans.avi=========
[root@yarrow ~]# ll /*/brick*/tfile*
/dummy/brick108/tfile_hot:
total 4
---------T. 2 root root 0 Dec  2 11:26 trans.avi

/rhs/brick1/tfile:
total 1899192
-rw-r--r--. 2 root root  743094942 Dec  2 11:21 hot700.avi
-rw-r--r--. 2 root root 1201674624 Dec  2 11:24 trans.avi

/rhs/brick2/tfile:
total 725680
---------T. 2 root root 0 Dec  2 11:24 mosa700.avi
[root@yarrow ~]# 
[root@yarrow ~]# 
[root@yarrow ~]# ll /*/brick*/tfile*
/dummy/brick108/tfile_hot:
total 4
---------T. 2 root root 0 Dec  2 11:26 trans.avi

/rhs/brick1/tfile:
total 1899192
-rw-r--r--. 2 root root  743094942 Dec  2 11:21 hot700.avi
-rw-r--r--. 2 root root 1201674624 Dec  2 11:24 trans.avi

/rhs/brick2/tfile:
total 0
---------T. 2 root root 0 Dec  2 11:26 mosa700.avi
[root@yarrow ~]# 



[root@zod ~]# gluster v info tfile
 
Volume Name: tfile
Type: Tier
Volume ID: 664b21d8-f295-4708-90d6-7b1b2195654b
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 2
Brick1: zod:/dummy/brick108/tfile_hot
Brick2: yarrow:/dummy/brick108/tfile_hot
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick3: zod:/rhs/brick1/tfile
Brick4: yarrow:/rhs/brick1/tfile
Brick5: zod:/rhs/brick2/tfile
Brick6: yarrow:/rhs/brick2/tfile
Options Reconfigured:
cluster.tier-mode: test
features.ctr-enabled: on
performance.readdir-ahead: on
[root@zod ~]# rpm -qa|grep gluster
glusterfs-server-3.7.5-7.el7rhgs.x86_64
glusterfs-libs-3.7.5-7.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-7.el7rhgs.x86_64
glusterfs-fuse-3.7.5-7.el7rhgs.x86_64
glusterfs-devel-3.7.5-7.el7rhgs.x86_64
glusterfs-api-3.7.5-7.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-7.el7rhgs.x86_64
glusterfs-3.7.5-7.el7rhgs.x86_64
glusterfs-cli-3.7.5-7.el7rhgs.x86_64
[root@zod ~]#

Comment 3 Nag Pavan Chilakam 2015-12-18 17:33:11 UTC
Also, if we delete such a file and then try to create a new file with same name we get following error on mount :
[root@rhs-client1 cola]# touch fil1
touch: setting times of ‘fil1’: Stale file handle
[root@rhs-client1 cola]# 
[root@rhs-client1 cola]# 
[root@rhs-client1 cola]# touch fil1
touch: setting times of ‘fil1’: Stale file handle
[root@rhs-client1 cola]# 
[root@rhs-client1 cola]# touch fil1
touch: setting times of ‘fil1’: Stale file handle

mount log:
[root@rhs-client1 cola]# tail /var/log/glusterfs/mnt-cola.log 
[2015-12-18 17:56:02.729983] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1
[2015-12-18 17:56:02.731658] I [MSGID: 109070] [dht-common.c:1865:dht_lookup_linkfile_cbk] 2-cola-tier-dht: lookup of /fil1 on cola-hot-dht (following linkfile) reached link,gfid = 00000000-0000-0000-0000-000000000000
[2015-12-18 17:56:02.732169] W [MSGID: 109009] [dht-common.c:1619:dht_lookup_everywhere_cbk] 2-cola-tier-dht: /fil1: gfid differs on subvolume cola-hot-dht, gfid local = 7a9ba3e2-72f2-45c0-ae60-76637e4169a5, gfid node = 32c02986-be79-42b2-8fac-f29abb356856
The message "W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-4: remote operation failed [Device or resource busy]" repeated 2 times between [2015-12-18 17:56:02.725145] and [2015-12-18 17:56:02.732609]
[2015-12-18 17:56:02.732612] I [MSGID: 109069] [dht-common.c:1069:dht_lookup_unlink_cbk] 2-cola-tier-dht: lookup_unlink returned with op_ret -> -1 and op-errno -> 16 for /fil1
[2015-12-18 17:56:02.733384] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-1: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733502] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-0: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733820] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1
[2015-12-18 17:56:02.741391] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-cola-tier-dht: Found anomalies in /fil1 (gfid = 32c02986-be79-42b2-8fac-f29abb356856). Holes=1 overlaps=0
[2015-12-18 17:56:02.741522] W [fuse-bridge.c:1058:fuse_setattr_cbk] 0-glusterfs-fuse: 2884793: SETATTR() /fil1 => -1 (Stale file handle)




after some time i was able to create the file but saw following error in mount log


[root@rhs-client1 cola]# tail /var/log/glusterfs/mnt-cola.log 
[2015-12-18 17:56:02.732169] W [MSGID: 109009] [dht-common.c:1619:dht_lookup_everywhere_cbk] 2-cola-tier-dht: /fil1: gfid differs on subvolume cola-hot-dht, gfid local = 7a9ba3e2-72f2-45c0-ae60-76637e4169a5, gfid node = 32c02986-be79-42b2-8fac-f29abb356856
The message "W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-4: remote operation failed [Device or resource busy]" repeated 2 times between [2015-12-18 17:56:02.725145] and [2015-12-18 17:56:02.732609]
[2015-12-18 17:56:02.732612] I [MSGID: 109069] [dht-common.c:1069:dht_lookup_unlink_cbk] 2-cola-tier-dht: lookup_unlink returned with op_ret -> -1 and op-errno -> 16 for /fil1
[2015-12-18 17:56:02.733384] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-1: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733502] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-0: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733820] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1
[2015-12-18 17:56:02.741391] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-cola-tier-dht: Found anomalies in /fil1 (gfid = 32c02986-be79-42b2-8fac-f29abb356856). Holes=1 overlaps=0
[2015-12-18 17:56:02.741522] W [fuse-bridge.c:1058:fuse_setattr_cbk] 0-glusterfs-fuse: 2884793: SETATTR() /fil1 => -1 (Stale file handle)
[2015-12-18 17:56:02.732187] I [MSGID: 109045] [dht-common.c:1744:dht_lookup_everywhere_cbk] 2-cola-tier-dht: attempting deletion of stale linkfile /fil1 on cola-hot-dht (hashed subvol is cola-cold-dht)
[2015-12-18 17:56:02.741502] E [MSGID: 109040] [dht-helper.c:1020:dht_migration_complete_check_task] 2-cola-tier-dht: /fil1: failed to lookup the file on cola-tier-dht [Stale file handle]
[root@rhs-client1 cola]#

Comment 8 hari gowtham 2018-11-08 19:19:09 UTC
As tier is not being actively developed, I'm closing this bug. Feel free to open it if necessary.