Bug 1277088 - Data Tiering:Rename of cold file to a hot file causing split brain and showing two copies of files in mount point
Summary: Data Tiering:Rename of cold file to a hot file causing split brain and showi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHGS 3.1.2
Assignee: Nithya Balachandran
QA Contact: RajeshReddy
URL:
Whiteboard:
Depends On:
Blocks: 1260783 1260923 1279376 1283480
TreeView+ depends on / blocked
 
Reported: 2015-11-02 10:03 UTC by Nag Pavan Chilakam
Modified: 2019-04-03 09:15 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.5-7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1279376 (view as bug list)
Environment:
Last Closed: 2016-03-01 05:50:00 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Nag Pavan Chilakam 2015-11-02 10:03:02 UTC
Description of problem:
=====================
Rename of a file to another existing file in a different tier but in same dht hash range seems to be causing split brain.
There is a file corruption and the file is showing up as two copies on the fuse mount client


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-server-3.7.5-5.el7rhgs.x86_64


How reproducible:
==================
easily

Steps to Reproduce:
===================
1.create a tier volume and have it mounted on fuse
2.create a big file of 1GB say GB.txt
3.create some zero byte files say z{1..10}
4. Now note down all the files which share the same brick as GB.txt in hot tier
5. Now keep all files idle and wait from them to get demoted
6.Now  once all files are demoted, note down the files which share the same brick as GB.txt in cold tier 
7.Now identify the file which shares GB.txt both in cold and hot tier. Lets assume the file is z4
8. Now touch all z{1..10} to get them to hot tier
9. Now rename GB.txt to Z4 using "mv" command
10. After proceeding with the confirm prompt, it can be seen that there are two instances of z4 on mount. 
Also check the client mount logs




CLient fuse logs:
===============
[root@mia newname]# ll
total 9116425
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 ff2
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 ff4
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 FnF7.mkv
-rw-r--r--. 1 root root 1555868318 Nov  2 07:52 k1
-rw-r--r--. 1 root root          0 Nov  2 07:52 k10
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 k2
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 k2
-rw-r--r--. 1 root root          0 Nov  2 07:52 k3
-rw-r--r--. 1 root root          0 Nov  2 07:52 k4
-rw-r--r--. 1 root root          0 Nov  2 07:52 k5
-rw-r--r--. 1 root root          0 Nov  2 07:52 k6
-rw-r--r--. 1 root root          0 Nov  2 07:52 k7
-rw-r--r--. 1 root root          0 Nov  2 07:52 k8
-rw-r--r--. 1 root root          0 Nov  2 07:52 k9
-rw-r--r--. 1 root root       6358 Nov  2  2015 stat.log
[root@mia newname]# 

[2015-11-02 02:20:05.500755] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-tier-dht: renaming /ff3 (hash=newname-hot-dht/cache=newname-cold-dht) => /k1 (hash=newname-hot-dht/cache=newname-hot-dht)
[2015-11-02 02:20:05.505746] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-newname-replicate-1: Failing SETATTR on gfid d423e54f-85cc-4725-b495-60addde165e1: split-brain observed. [Input/output error]
[2015-11-02 02:20:05.505792] E [MSGID: 109031] [dht-linkfile.c:306:dht_linkfile_setattr_cbk] 0-newname-cold-dht: Failed to set attr uid/gid on /ff3 :<gfid:00000000-0000-0000-0000-000000000000>  [Input/output error]
[2015-11-02 02:20:05.505827] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-hot-dht: renaming /ff3 (hash=newname-replicate-2/cache=newname-replicate-2) => /k1 (hash=newname-replicate-2/cache=newname-replicate-2)
[2015-11-02 02:23:31.481425] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-tier-dht: renaming /ff1 (hash=newname-hot-dht/cache=newname-cold-dht) => /k2 (hash=newname-hot-dht/cache=newname-hot-dht)
[2015-11-02 02:23:31.485198] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-hot-dht: renaming /ff1 (hash=newname-replicate-3/cache=newname-replicate-3) => /k2 (hash=newname-replicate-2/cache=newname-replicate-2)
[2015-11-02 02:23:31.486837] W [MSGID: 109065] [dht-rename.c:1231:dht_rename_lock_cbk] 0-newname-hot-dht: acquiring inodelk failed rename (/ff1:d0b5d1c0-ba5d-40f9-af2a-9e7fe745bf4d:newname-replicate-3 /k2:c79c0457-6285-4fd1-8235-7a9fa655c625:newname-replicate-2), returning EBUSY [Stale file handle]
[2015-11-02 02:23:31.486879] I [MSGID: 109030] [dht-rename.c:729:dht_rename_cbk] 0-newname-tier-dht: /ff1: Rename (linkto file) on newname-hot-dht failed, (gfid = d0b5d1c0-ba5d-40f9-af2a-9e7fe745bf4d)  [Stale file handle]

[2015-11-02 02:31:31.261176] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-newname-client-5: remote operation failed. Path: /k10 (72b447d2-4434-4088-9554-2b13f2cc8dd8) [No such file or directory]
[2015-11-02 02:31:31.261306] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-newname-client-4: remote operation failed. Path: /k10 (72b447d2-4434-4088-9554-2b13f2cc8dd8) [No such file or directory]



============== See logs of file ff1 and K2 as ff1 was renamed to k2======
[2015-11-02 09:30:00.679944] E [MSGID: 109037] [tier.c:1498:tier_start] 0-newname-tier-dht: Promotion failed
[2015-11-02 09:31:10.788134] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2353.00 secs
[2015-11-02 09:31:10.788177] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 13, size: 0, lookups: 76, failures: 2, skipped: 0
[2015-11-02 09:31:10.826864] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2353.00 secs
[2015-11-02 09:31:10.826898] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 13, size: 0, lookups: 76, failures: 2, skipped: 0
[2015-11-02 09:32:22.542170] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2425.00 secs
[2015-11-02 09:32:22.542215] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 24, size: 0, lookups: 87, failures: 2, skipped: 0
[2015-11-02 09:32:22.566528] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2425.00 secs
[2015-11-02 09:32:22.566531] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 24, size: 0, lookups: 87, failures: 2, skipped: 0
[2015-11-02 09:36:00.922513] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:36:00.923177] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:36:01.102854] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:38:00.126505] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:38:00.127246] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:38:00.127412] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:40:00.151145] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:40:00.151954] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:40:00.152134] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:42:00.178071] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:42:00.178843] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:42:00.179016] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:44:00.202865] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:44:00.203542] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:44:00.203674] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:46:00.227746] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:46:00.228477] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:46:00.228607] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:48:00.252896] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:48:00.253641] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:48:00.253827] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:50:00.277610] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:50:00.278400] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:50:00.278553] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:52:00.305940] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:52:00.306664] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:52:00.306812] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:54:00.330672] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:54:00.331513] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:54:00.331655] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:56:00.358181] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:56:00.358968] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:56:00.359139] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:58:00.382393] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:58:00.383448] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:58:00.383565] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 10:00:00.405530] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 10:00:00.406339] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 10:00:00.406457] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed

Comment 2 Nag Pavan Chilakam 2015-11-03 05:53:46 UTC
sosreports@ below location. Refer volume "newname"
[nchilaka@rhsqe-repo bug.1277088]$ pwd
/home/repo/sosreports/nchilaka/bug.1277088

Comment 6 RajeshReddy 2015-12-03 10:14:05 UTC
Tested with build glusterfs-server-3.7.5-8, and tried both rename(move) of files in hot tier to files in the cold tier and same way rename (move) of files in the cold tier to files in hot tier and after rename operation mount shows file correctly so marking this bug as verified 

Note: If client is running lower version (glusterfs-api-3.7.5-5) rename operation is failing with device busy error

Comment 8 errata-xmlrpc 2016-03-01 05:50:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.