Bug 1287436 - Link files being created(and not cleaned up), if hot tier is full during migration
Link files being created(and not cleaned up), if hot tier is full during mig...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
unspecified
Unspecified Unspecified
low Severity low
: ---
: ---
Assigned To: hari gowtham
nchilaka
tier-migration
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-02 01:03 EST by nchilaka
Modified: 2018-01-29 15:09 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2015-12-02 01:03:31 EST
Description of problem:
=====================
When a file is getting promoted to hot tier, but fails to get promoted due to disk being full, the promote fails with disk space not available.
But as part of migration we create a link file in hot tier and this is not getting cleanedup.

Version-Release number of selected component (if applicable):
=[========================================================
[root@zod ~]# rpm -qa|grep gluster
glusterfs-server-3.7.5-7.el7rhgs.x86_64
glusterfs-libs-3.7.5-7.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-7.el7rhgs.x86_64
glusterfs-fuse-3.7.5-7.el7rhgs.x86_64
glusterfs-devel-3.7.5-7.el7rhgs.x86_64
glusterfs-api-3.7.5-7.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-7.el7rhgs.x86_64
glusterfs-3.7.5-7.el7rhgs.x86_64
glusterfs-cli-3.7.5-7.el7rhgs.x86_64
[root@zod ~]# 



Steps to Reproduce:
1.create a regular volume with two 1 gb files
2.now attach a tier with max of 1.1 gb
3. Now turn of cache mode 
4, append both the files to see them being promoted next cycle
5. in next cycle one gets promoted, but other fails as below
[2015-12-02 05:56:01.006402] E [MSGID: 109023] [dht-rebalance.c:721:__dht_check_free_space] 0-tfile-tier-dht: data movement attempted from node (tfile-cold-dht) to node (tfile-hot-dht) which does not have required free space for (//trans.avi)


6. But it leaves a link file in hot tier 


Implications:
=============
1)unncessary inodes being used up
2)the whole point of having cold tier as hashed subvol will be in a way defeated.





=======In below case I was trying to promote trans.avi=========
[root@yarrow ~]# ll /*/brick*/tfile*
/dummy/brick108/tfile_hot:
total 4
---------T. 2 root root 0 Dec  2 11:26 trans.avi

/rhs/brick1/tfile:
total 1899192
-rw-r--r--. 2 root root  743094942 Dec  2 11:21 hot700.avi
-rw-r--r--. 2 root root 1201674624 Dec  2 11:24 trans.avi

/rhs/brick2/tfile:
total 725680
---------T. 2 root root 0 Dec  2 11:24 mosa700.avi
[root@yarrow ~]# 
[root@yarrow ~]# 
[root@yarrow ~]# ll /*/brick*/tfile*
/dummy/brick108/tfile_hot:
total 4
---------T. 2 root root 0 Dec  2 11:26 trans.avi

/rhs/brick1/tfile:
total 1899192
-rw-r--r--. 2 root root  743094942 Dec  2 11:21 hot700.avi
-rw-r--r--. 2 root root 1201674624 Dec  2 11:24 trans.avi

/rhs/brick2/tfile:
total 0
---------T. 2 root root 0 Dec  2 11:26 mosa700.avi
[root@yarrow ~]# 



[root@zod ~]# gluster v info tfile
 
Volume Name: tfile
Type: Tier
Volume ID: 664b21d8-f295-4708-90d6-7b1b2195654b
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 2
Brick1: zod:/dummy/brick108/tfile_hot
Brick2: yarrow:/dummy/brick108/tfile_hot
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick3: zod:/rhs/brick1/tfile
Brick4: yarrow:/rhs/brick1/tfile
Brick5: zod:/rhs/brick2/tfile
Brick6: yarrow:/rhs/brick2/tfile
Options Reconfigured:
cluster.tier-mode: test
features.ctr-enabled: on
performance.readdir-ahead: on
[root@zod ~]# rpm -qa|grep gluster
glusterfs-server-3.7.5-7.el7rhgs.x86_64
glusterfs-libs-3.7.5-7.el7rhgs.x86_64
glusterfs-client-xlators-3.7.5-7.el7rhgs.x86_64
glusterfs-fuse-3.7.5-7.el7rhgs.x86_64
glusterfs-devel-3.7.5-7.el7rhgs.x86_64
glusterfs-api-3.7.5-7.el7rhgs.x86_64
glusterfs-debuginfo-3.7.5-7.el7rhgs.x86_64
glusterfs-3.7.5-7.el7rhgs.x86_64
glusterfs-cli-3.7.5-7.el7rhgs.x86_64
[root@zod ~]#
Comment 3 nchilaka 2015-12-18 12:33:11 EST
Also, if we delete such a file and then try to create a new file with same name we get following error on mount :
[root@rhs-client1 cola]# touch fil1
touch: setting times of ‘fil1’: Stale file handle
[root@rhs-client1 cola]# 
[root@rhs-client1 cola]# 
[root@rhs-client1 cola]# touch fil1
touch: setting times of ‘fil1’: Stale file handle
[root@rhs-client1 cola]# 
[root@rhs-client1 cola]# touch fil1
touch: setting times of ‘fil1’: Stale file handle

mount log:
[root@rhs-client1 cola]# tail /var/log/glusterfs/mnt-cola.log 
[2015-12-18 17:56:02.729983] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1
[2015-12-18 17:56:02.731658] I [MSGID: 109070] [dht-common.c:1865:dht_lookup_linkfile_cbk] 2-cola-tier-dht: lookup of /fil1 on cola-hot-dht (following linkfile) reached link,gfid = 00000000-0000-0000-0000-000000000000
[2015-12-18 17:56:02.732169] W [MSGID: 109009] [dht-common.c:1619:dht_lookup_everywhere_cbk] 2-cola-tier-dht: /fil1: gfid differs on subvolume cola-hot-dht, gfid local = 7a9ba3e2-72f2-45c0-ae60-76637e4169a5, gfid node = 32c02986-be79-42b2-8fac-f29abb356856
The message "W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-4: remote operation failed [Device or resource busy]" repeated 2 times between [2015-12-18 17:56:02.725145] and [2015-12-18 17:56:02.732609]
[2015-12-18 17:56:02.732612] I [MSGID: 109069] [dht-common.c:1069:dht_lookup_unlink_cbk] 2-cola-tier-dht: lookup_unlink returned with op_ret -> -1 and op-errno -> 16 for /fil1
[2015-12-18 17:56:02.733384] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-1: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733502] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-0: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733820] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1
[2015-12-18 17:56:02.741391] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-cola-tier-dht: Found anomalies in /fil1 (gfid = 32c02986-be79-42b2-8fac-f29abb356856). Holes=1 overlaps=0
[2015-12-18 17:56:02.741522] W [fuse-bridge.c:1058:fuse_setattr_cbk] 0-glusterfs-fuse: 2884793: SETATTR() /fil1 => -1 (Stale file handle)




after some time i was able to create the file but saw following error in mount log


[root@rhs-client1 cola]# tail /var/log/glusterfs/mnt-cola.log 
[2015-12-18 17:56:02.732169] W [MSGID: 109009] [dht-common.c:1619:dht_lookup_everywhere_cbk] 2-cola-tier-dht: /fil1: gfid differs on subvolume cola-hot-dht, gfid local = 7a9ba3e2-72f2-45c0-ae60-76637e4169a5, gfid node = 32c02986-be79-42b2-8fac-f29abb356856
The message "W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-4: remote operation failed [Device or resource busy]" repeated 2 times between [2015-12-18 17:56:02.725145] and [2015-12-18 17:56:02.732609]
[2015-12-18 17:56:02.732612] I [MSGID: 109069] [dht-common.c:1069:dht_lookup_unlink_cbk] 2-cola-tier-dht: lookup_unlink returned with op_ret -> -1 and op-errno -> 16 for /fil1
[2015-12-18 17:56:02.733384] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-1: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733502] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 2-cola-client-0: remote operation failed [Device or resource busy]
[2015-12-18 17:56:02.733820] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 2-cola-tier-dht: Returned with op_ret -1 and op_errno 16 for /fil1
[2015-12-18 17:56:02.741391] I [MSGID: 109063] [dht-layout.c:702:dht_layout_normalize] 2-cola-tier-dht: Found anomalies in /fil1 (gfid = 32c02986-be79-42b2-8fac-f29abb356856). Holes=1 overlaps=0
[2015-12-18 17:56:02.741522] W [fuse-bridge.c:1058:fuse_setattr_cbk] 0-glusterfs-fuse: 2884793: SETATTR() /fil1 => -1 (Stale file handle)
[2015-12-18 17:56:02.732187] I [MSGID: 109045] [dht-common.c:1744:dht_lookup_everywhere_cbk] 2-cola-tier-dht: attempting deletion of stale linkfile /fil1 on cola-hot-dht (hashed subvol is cola-cold-dht)
[2015-12-18 17:56:02.741502] E [MSGID: 109040] [dht-helper.c:1020:dht_migration_complete_check_task] 2-cola-tier-dht: /fil1: failed to lookup the file on cola-tier-dht [Stale file handle]
[root@rhs-client1 cola]#

Note You need to log in before you can comment on or make changes to this bug.