Bug 1282388 - Data Tiering:delete command rm -rf not deleting files the linkto file(hashed) which are under migration and possible spit-brain observed and possible disk wastage
Summary: Data Tiering:delete command rm -rf not deleting files the linkto file(hashed...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: 3.7.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1276227 1282390 glusterfs-3.7.9
TreeView+ depends on / blocked
 
Reported: 2015-11-16 09:48 UTC by Nithya Balachandran
Modified: 2016-04-19 07:23 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.7.9
Doc Type: Bug Fix
Doc Text:
Clone Of: 1276227
Environment:
Last Closed: 2016-04-19 07:23:50 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2015-11-16 09:48:45 UTC
+++ This bug was initially created as a clone of Bug #1276227 +++

Description of problem:
========================
On a tiered volume which has files under migration, if we issue an rm -rf, all the files including which are under migration are deleted but are leaving the link-to file(in hashed subvol) undeleted.
The link-to files are later getting converted to regular files and occupying disk space unnecessarily.

While, we are deleting the original or cached file, I don't see a point of having the hashed file anymore. We need to have locks removed there too.

Version-Release number of selected component (if applicable):
============================================================
glusterfs-server-3.7.5-0.3.el7rhgs.x86_64


How reproducible:
==================
very easy and always

Steps to Reproduce:
====================
1.create,start and mount a tiered volume
2.create some files which take a while to get promoted/demoted. So let each file be of atleast 800MB. Create about 20 such files
3.Now let the demote cycle start.
4. Once the demote cycle starts, it can be seen that the files are being demoted as below in the cached and hashed subvol(see file is.7)

[root@zod glusterfs]# ll /rhs/brick*/rosa*/
/rhs/brick1/rosa/:
total 1876672
-rw-r--r--. 2 root root 614400000 Oct 29 12:00 is.1
-rw-r--r--. 2 root root 614400000 Oct 29 12:00 is.3
---------T. 2 root root 614400000 Oct 29 12:06 is.7
-rw-r--r--. 2 root root 614400000 Oct 28 19:32 new.14

/rhs/brick2/rosa/:
total 1800000
-rw-r--r--. 2 root root 614400000 Oct 29 12:00 is.2
-rw-r--r--. 2 root root 614400000 Oct 29 12:01 is.4
-rw-r--r--. 2 root root 614400000 Oct 29 12:01 is.6

/rhs/brick6/rosa_hot/:
total 9388992
-rw-r--r--. 2 root root 614400000 Oct 29 12:02 is.10
-rw-r--r--. 2 root root 398327808 Oct 29 12:04 is.22
-rw-r-Sr-T. 2 root root 614400000 Oct 29 12:01 is.7


5. Now from the fuse mount, before all files are demoted, issue a rm -rf to delete all files
6. It can be seen all files are delete except for the files which were under migrate 
7. Now if u check the backend brick immediately, it can be seen that it is a link-to file which is not deleted.
And after a few seconds this link-to file is converted to a normal read-write file as below


[root@zod glusterfs]# ll /rhs/brick*/rosa*/
/rhs/brick1/rosa/:
total 582400
---------T. 2 root root 614400000 Oct 29 12:07 is.7

==after few seconds========
[root@zod glusterfs]# 
[root@zod glusterfs]# ll /rhs/brick*/rosa*/
/rhs/brick1/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 12:01 is.7


8. If u monitor the client fuse logs, it can be seen that a possible split brain is observed:
[2015-10-29 11:41:18.567156] W [MSGID: 114031] [client-rpc-fops.c:1569:client3_3_fstat_cbk] 0-rosa-client-2: remote operation failed [No such file or directory]
[2015-10-29 11:41:18.571387] W [MSGID: 108008] [afr-read-txn.c:250:afr_read_txn] 0-rosa-replicate-1: Unreadable subvolume -1 found with event generation 2 for gfid 360ed98c-d031-4631-a1fc-0fface82400f. (Possible split-brain)
[2015-10-29 11:41:18.575262] E [MSGID: 109040] [dht-helper.c:1020:dht_migration_complete_check_task] 0-rosa-cold-dht: (null): failed to lookup the file on rosa-cold-dht [Stale file handle]
[2015-10-29 11:41:18.578245] W [MSGID: 108008] [afr-read-txn.c:250:afr_read_txn] 0-rosa-replicate-1: Unreadable subvolume -1 found with event generation 2 for gfid 360ed98c-d031-4631-a1fc-0fface82400f. (Possible split-brain)



Actual results:
==============
1)linkto file getting converted to a regular file
2)disk wastage happens due to this
3)split brain possibly seen
4)Also, later I can see a different bit rot version on the replicas(i didnt enable bitrot)

[root@zod glusterfs]# getfattr -d -m . -e hex /rhs/brick*/rosa*/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/rosa/is.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000aa25b
trusted.gfid=0x6db6cae40a784af38da9af842243ffe8
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x00000000249f00000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400



replica:
[root@yarrow glusterfs]# getfattr -d -m . -e hex /rhs/brick*/rosa*/*
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/rosa/is.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f9a0003bc6d
trusted.gfid=0x6db6cae40a784af38da9af842243ffe8
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x00000000249f00000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400




Expected results:
===================
None of the issues should be seen

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-10-29 03:04:10 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from nchilaka on 2015-10-29 03:59:56 EDT ---

following is the xattrs during the delete of files:
[root@zod glusterfs]# head -n 853 /heels.log |tail -n 100 
/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 510080
---------T. 2 root root 614400000 Oct 29 13:16 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 568960
---------T. 2 root root 614400000 Oct 29 13:16 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000010000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 13:13 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 13:13 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

/rhs/brick1/rosa/:
total 0

/rhs/brick2/rosa/:
total 600000
-rw-r--r--. 2 root root 614400000 Oct 29 13:13 heaven.3

/rhs/brick6/rosa_hot/:
total 0

/rhs/brick7/rosa_hot/:
total 0
# file: rhs/brick2/rosa/heaven.3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000562f7f97000d37e6
trusted.gfid=0x644b07152673448f8b29cb3e43940f13
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
trusted.tier-gfid.linkto=0x726f73612d686f742d64687400

[root@zod glusterfs]#

--- Additional comment from nchilaka on 2015-10-29 04:00:21 EDT ---

tier logs:
===========
2015-10-29 07:46:23.327109] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-rosa-tier-dht: Tier 0 src_subvol rosa-hot-dht file heaven.3
[2015-10-29 07:46:23.328847] I [dht-rebalance.c:1103:dht_migrate_file] 0-rosa-tier-dht: /heaven.3: attempting to move from rosa-hot-dht to rosa-cold-dht
[2015-10-29 07:46:44.142458] W [dht-rebalance.c:1247:dht_migrate_file] 0-rosa-tier-dht: /heaven.3: failed to fsync on rosa-cold-dht (Structure needs cleaning)
[2015-10-29 07:46:44.144700] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-rosa-client-7: remote operation failed. Path: <gfid:644b0715-2673-448f-8b29-cb3e43940f13> (644b0715-2673-448f-8b29-cb3e43940f13) [No such file or directory]
[2015-10-29 07:46:44.144923] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-rosa-client-6: remote operation failed. Path: <gfid:644b0715-2673-448f-8b29-cb3e43940f13> (644b0715-2673-448f-8b29-cb3e43940f13) [No such file or directory]
[2015-10-29 07:46:44.145032] W [MSGID: 109023] [dht-rebalance.c:1317:dht_migrate_file] 0-rosa-tier-dht: Migrate file failed:/heaven.3: failed to get xattr from rosa-hot-dht (No such file or directory)
[2015-10-29 07:46:44.145091] E [MSGID: 108008] [afr-transaction.c:1975:afr_transaction] 0-rosa-replicate-2: Failing FSETATTR on gfid 644b0715-2673-448f-8b29-cb3e43940f13: split-brain observed. [Input/output error]
[2015-10-29 07:46:44.145470] W [MSGID: 109023] [dht-rebalance.c:1356:dht_migrate_file] 0-rosa-tier-dht: Migrate file failed:/heaven.3: failed to perform setattr on rosa-hot-dht  [Input/output error]
[2015-10-29 07:46:44.146381] E [MSGID: 109037] [tier.c:492:tier_migrate_using_query_file] 0-rosa-tier-dht: ERROR -28 in current migration heaven.3 /heaven.3

[2015-10-29 07:46:44.150682] E [MSGID: 109037] [tier.c:442:tier_migrate_using_query_file] 0-rosa-tier-dht: ERROR in current lookup

[2015-10-29 07:46:44.153524] E [MSGID: 109037] [tier.c:442:tier_migrate_using_query_file] 0-rosa-tier-dht: ERROR in current lookup

[2015-10-29 07:46:44.153656] E [MSGID: 109037] [tier.c:1446:tier_start] 0-rosa-tier-dht: Demotion failed
[2015-10-29 07:48:00.161457] I [MSGID: 109038] [tier.c:1010:tier_build_migration_qfile] 0-rosa-tier-dht: Failed to remove /var/run/gluster/rosa-tier-dht/demotequeryfile-rosa-tier-dht
^C

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-11-03 10:12:02 EST ---

Since this bug has been approved for the z-stream release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.1.z+', and has been marked for RHGS 3.1 Update 2 release through the Internal Whiteboard entry of '3.1.2', the Target Release is being automatically set to 'RHGS 3.1.2'

--- Additional comment from Nithya Balachandran on 2015-11-10 06:51:14 EST ---

This is reproducible during a demotion:

Analysis:

When a file is being demoted, the hashed subvolume (hot tier) contains the data file. As the hashed_subvol == cached_subvol, DHT sends an unlink only to the cached subvol. 

When a file is being migrated, fds are opened on the source and destination files. The migration operation proceeds even after the unlink from the client as the src file fd is still open. As the dst linkto file was never unlinked, once the data copy is complete, it is converted to a source file.

Comment 1 Niels de Vos 2015-11-17 12:22:17 UTC
Nithya, could you please provide a public description of the problem? This is a community bug and we would like to give an understanding of issues, reproducers and fixes to our community members.

Thanks!

Comment 2 Nithya Balachandran 2015-12-08 05:47:16 UTC
(In reply to Niels de Vos from comment #1)
> Nithya, could you please provide a public description of the problem? This
> is a community bug and we would like to give an understanding of issues,
> reproducers and fixes to our community members.
> 
> Thanks!


Sorry, missed this needinfo somehow. Have removed the private tag for the description.

Comment 3 Vijay Bellur 2015-12-17 05:55:29 UTC
REVIEW: http://review.gluster.org/12991 (tier:unlink during migration) posted (#1) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 4 Vijay Bellur 2016-01-04 08:50:28 UTC
REVIEW: http://review.gluster.org/12991 (tier:unlink during migration) posted (#2) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 5 Vijay Bellur 2016-01-04 09:10:05 UTC
REVIEW: http://review.gluster.org/12991 (tier:unlink during migration) posted (#3) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 6 Vijay Bellur 2016-01-04 09:19:59 UTC
REVIEW: http://review.gluster.org/12991 (tier:unlink during migration) posted (#4) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 7 Vijay Bellur 2016-02-10 05:48:32 UTC
REVIEW: http://review.gluster.org/12991 (tier:unlink during migration) posted (#5) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 8 Vijay Bellur 2016-02-22 14:50:59 UTC
COMMIT: http://review.gluster.org/12991 committed in release-3.7 by Dan Lambright (dlambrig) 
------
commit 6a565219fb1631e9b14c676458c8c04251886494
Author: Mohammed Rafi KC <rkavunga>
Date:   Mon Nov 30 19:02:54 2015 +0530

    tier:unlink during migration
    
    files deleted during promotion were not deleting as the
    files are moving from hashed to non-hashed
    
    On deleting a file that is undergoing promotion,
    the unlink call is not sent to the dst file as the
    hashed subvol == cached subvol. This causes
    the file to reappear once the migration is complete.
    
    This patch also fixes a problem with stale linkfile
    deleting.
    
    Backport of>
    >Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5
    >BUG: 1282390
    >Signed-off-by: Mohammed Rafi KC <rkavunga>
    >Reviewed-on: http://review.gluster.org/12829
    >Tested-by: NetBSD Build System <jenkins.org>
    >Tested-by: Gluster Build System <jenkins.com>
    >Reviewed-by: Dan Lambright <dlambrig>
    >Tested-by: Dan Lambright <dlambrig>
    
    (cherry picked from commit b5de382afa8c5777e455c7a376fc4f1f01d782d1)
    
    Change-Id: I951adb4d929926bcd646dd7574f7a2d41d57479d
    BUG: 1282388
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/12991
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Dan Lambright <dlambrig>

Comment 9 Kaushal 2016-04-19 07:23:50 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.