Bug 1276227 - Data Tiering:delete command rm -rf not deleting files the linkto file(hashed) which are under migration and possible spit-brain observed and possible disk wastage
Summary: Data Tiering:delete command rm -rf not deleting files the linkto file(hashed...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHGS 3.1.2
Assignee: Mohammed Rafi KC
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On: 1282388 1282390
Blocks: 1260783 1260923 1285237 1289437 1289975
TreeView+ depends on / blocked
 
Reported: 2015-10-29 07:04 UTC by Nag Pavan Chilakam
Modified: 2019-04-03 09:15 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.7.5-12
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1282388 1282390 (view as bug list)
Environment:
Last Closed: 2016-03-01 05:47:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Nag Pavan Chilakam 2015-10-29 07:04:06 UTC
Description of problem:
========================
On a tiered volume which has files under migration, if we issue an rm -rf, all the files including which are under migration are deleted but are leaving the link-to file(in hashed subvol) undeleted.
The link-to files are later getting converted to regular files and occupying disk space unncessarily.

While, we are deleting the original or cached file, I don't see a point of having the hashed file anymore. We need to have locks removed there too.

Version-Release number of selected component (if applicable):
============================================================
glusterfs-server-3.7.5-0.3.el7rhgs.x86_64


How reproducible:
==================
very easy and always

Steps to Reproduce:
====================
1.create,start and mount a tiered volume
2.create some files which take a while to get promoted/demoted. So let each file be of atleast 800MB. Create about 20 such files
3.Now let the demote cycle start.
4. Once the demote cycle starts, it can be seen that the files are being demoted as below in the cached and hashed subvol(see file is.7)

[root@zod glusterfs]# ll /rhs/brick*/rosa*/
/rhs/brick1/rosa/:
total 1876672
-rw-r--r--. 2 root root 614400000 Oct 29 12:00 is.1
-rw-r--r--. 2 root root 614400000 Oct 29 12:00 is.3
---------T. 2 root root 614400000 Oct 29 12:06 is.7
-rw-r--r--. 2 root root 614400000 Oct 28 19:32 new.14

/rhs/brick2/rosa/:
total 1800000
-rw-r--r--. 2 root root 614400000 Oct 29 12:00 is.2
-rw-r--r--. 2 root root 614400000 Oct 29 12:01 is.4
-rw-r--r--. 2 root root 614400000 Oct 29 12:01 is.6

/rhs/brick6/rosa_hot/:
total 9388992
-rw-r--r--. 2 root root 614400000 Oct 29 12:02 is.10
-rw-r--r--. 2 root root 398327808 Oct 29 12:04 is.22
-rw-r-Sr-T. 2 root root 614400000 Oct 29 12:01 is.7


5. Now from the fuse mount, before all files are demoted, issue a rm -rf to delete all files
6. It can be seen all files are delete except for the files which were under migrate 
7. Now if u check the backend brick immediately, it can be seen that it is a link-to file which is not deleted.
And after a few seconds this link-to file is converted to a normal read-write file as below


[root@zod glusterfs]# ll /rhs/brick*/rosa*/
/rhs/brick1/rosa/:
total 582400
---------T. 2 root root 614400000 Oct 29 12:07 is.7

==after few seconds========
[root@zod glusterfs]# 
[root@zod glusterfs]# ll /rhs/brick*/rosa*/
/rhs/brick1/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 12:01 is.7


8. If u monitor the client fuse logs, it can be seen that a possible split brain is observed:
[2015-10-29 11:41:18.567156] W [MSGID: 114031] [client-rpc-fops.c:1569:client3_3_fstat_cbk] 0-rosa-client-2: remote operation failed [No such file or directory]
[2015-10-29 11:41:18.571387] W [MSGID: 108008] [afr-read-txn.c:250:afr_read_txn] 0-rosa-replicate-1: Unreadable subvolume -1 found with event generation 2 for gfid 360ed98c-d031-4631-a1fc-0fface82400f. (Possible split-brain)
[2015-10-29 11:41:18.575262] E [MSGID: 109040] [dht-helper.c:1020:dht_migration_complete_check_task] 0-rosa-cold-dht: (null): failed to lookup the file on rosa-cold-dht [Stale file handle]
[2015-10-29 11:41:18.578245] W [MSGID: 108008] [afr-read-txn.c:250:afr_read_txn] 0-rosa-replicate-1: Unreadable subvolume -1 found with event generation 2 for gfid 360ed98c-d031-4631-a1fc-0fface82400f. (Possible split-brain)



Actual results:
==============
1)linkto file getting converted to a regular file
2)disk wastage happens due to this
3)split brain possibly seen
4)Also, later I can see a different bit rot version on the replicas(i didnt enable bitrot)

[root@zod glusterfs]# getfattr -d -m . -e hex /rhs/brick*/rosa*/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/rosa/is.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000aa25b
trusted.gfid=0x6db6cae40a784af38da9af842243ffe8
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x00000000249f00000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400



replica:
[root@yarrow glusterfs]# getfattr -d -m . -e hex /rhs/brick*/rosa*/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/rosa/is.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f9a0003bc6d
trusted.gfid=0x6db6cae40a784af38da9af842243ffe8
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x00000000249f00000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400




Expected results:
===================
None of the issues should be seen

Comment 2 Nag Pavan Chilakam 2015-10-29 07:59:56 UTC
following is the xattrs during the delete of files:
[root@zod glusterfs]# head -n 853 /heels.log |tail -n 100 
/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 510080
---------T. 2 root root 614400000 Oct 29 13:16 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 568960
---------T. 2 root root 614400000 Oct 29 13:16 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 13:13 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 13:13 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 13:13 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

[root@zod glusterfs]#

Comment 3 Nag Pavan Chilakam 2015-10-29 08:00:21 UTC
tier logs:
===========
2015-10-29 07:46:23.327109] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-rosa-tier-dht: Tier 0 src_subvol rosa-hot-dht file heaven.3
[2015-10-29 07:46:23.328847] I [dht-rebalance.c:1103:dht_migrate_file] 0-rosa-tier-dht: /heaven.3: attempting to move from rosa-hot-dht to rosa-cold-dht
[2015-10-29 07:46:44.142458] W [dht-rebalance.c:1247:dht_migrate_file] 0-rosa-tier-dht: /heaven.3: failed to fsync on rosa-cold-dht (Structure needs cleaning)
[2015-10-29 07:46:44.144700] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-rosa-client-7: remote operation failed. Path: <gfid:644b0715-2673-448f-8b29-cb3e43940f13> (644b0715-2673-448f-8b29-cb3e43940f13) [No such file or directory]
[2015-10-29 07:46:44.144923] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-rosa-client-6: remote operation failed. Path: <gfid:644b0715-2673-448f-8b29-cb3e43940f13> (644b0715-2673-448f-8b29-cb3e43940f13) [No such file or directory]
[2015-10-29 07:46:44.145032] W [MSGID: 109023] [dht-rebalance.c:1317:dht_migrate_file] 0-rosa-tier-dht: Migrate file failed:/heaven.3: failed to get xattr from rosa-hot-dht (No such file or directory)
[2015-10-29 07:46:44.145091] E [MSGID: 108008] [afr-transaction.c:1975:afr_transaction] 0-rosa-replicate-2: Failing FSETATTR on gfid 644b0715-2673-448f-8b29-cb3e43940f13: split-brain observed. [Input/output error]
[2015-10-29 07:46:44.145470] W [MSGID: 109023] [dht-rebalance.c:1356:dht_migrate_file] 0-rosa-tier-dht: Migrate file failed:/heaven.3: failed to perform setattr on rosa-hot-dht  [Input/output error]
[2015-10-29 07:46:44.146381] E [MSGID: 109037] [tier.c:492:tier_migrate_using_query_file] 0-rosa-tier-dht: ERROR -28 in current migration heaven.3 /heaven.3

[2015-10-29 07:46:44.150682] E [MSGID: 109037] [tier.c:442:tier_migrate_using_query_file] 0-rosa-tier-dht: ERROR in current lookup

[2015-10-29 07:46:44.153524] E [MSGID: 109037] [tier.c:442:tier_migrate_using_query_file] 0-rosa-tier-dht: ERROR in current lookup

[2015-10-29 07:46:44.153656] E [MSGID: 109037] [tier.c:1446:tier_start] 0-rosa-tier-dht: Demotion failed
[2015-10-29 07:48:00.161457] I [MSGID: 109038] [tier.c:1010:tier_build_migration_qfile] 0-rosa-tier-dht: Failed to remove /var/run/gluster/rosa-tier-dht/demotequeryfile-rosa-tier-dht
^C

Comment 6 Mohammed Rafi KC 2015-11-30 14:40:32 UTC
upstream patch : http://review.gluster.org/12829

Comment 7 Vivek Agarwal 2015-12-17 09:27:44 UTC
https://code.engineering.redhat.com/gerrit/64015

Comment 11 errata-xmlrpc 2016-03-01 05:47:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.