Bug 1277088 - Data Tiering:Rename of cold file to a hot file causing split brain and showing two copies of files in mount point
Data Tiering:Rename of cold file to a hot file causing split brain and showi...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
unspecified
Unspecified Unspecified
urgent Severity urgent
: ---
: RHGS 3.1.2
Assigned To: Nithya Balachandran
RajeshReddy
: ZStream
Depends On:
Blocks: 1260783 1260923 1279376 1283480
  Show dependency treegraph
 
Reported: 2015-11-02 05:03 EST by nchilaka
Modified: 2016-09-17 11:37 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.7.5-7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1279376 (view as bug list)
Environment:
Last Closed: 2016-03-01 00:50:00 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2015-11-02 05:03:02 EST
Description of problem:
=====================
Rename of a file to another existing file in a different tier but in same dht hash range seems to be causing split brain.
There is a file corruption and the file is showing up as two copies on the fuse mount client


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-server-3.7.5-5.el7rhgs.x86_64


How reproducible:
==================
easily

Steps to Reproduce:
===================
1.create a tier volume and have it mounted on fuse
2.create a big file of 1GB say GB.txt
3.create some zero byte files say z{1..10}
4. Now note down all the files which share the same brick as GB.txt in hot tier
5. Now keep all files idle and wait from them to get demoted
6.Now  once all files are demoted, note down the files which share the same brick as GB.txt in cold tier 
7.Now identify the file which shares GB.txt both in cold and hot tier. Lets assume the file is z4
8. Now touch all z{1..10} to get them to hot tier
9. Now rename GB.txt to Z4 using "mv" command
10. After proceeding with the confirm prompt, it can be seen that there are two instances of z4 on mount. 
Also check the client mount logs




CLient fuse logs:
===============
[root@mia newname]# ll
total 9116425
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 ff2
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 ff4
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 FnF7.mkv
-rw-r--r--. 1 root root 1555868318 Nov  2 07:52 k1
-rw-r--r--. 1 root root          0 Nov  2 07:52 k10
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 k2
-rw-r--r--. 1 root root 1555868318 Nov  2  2015 k2
-rw-r--r--. 1 root root          0 Nov  2 07:52 k3
-rw-r--r--. 1 root root          0 Nov  2 07:52 k4
-rw-r--r--. 1 root root          0 Nov  2 07:52 k5
-rw-r--r--. 1 root root          0 Nov  2 07:52 k6
-rw-r--r--. 1 root root          0 Nov  2 07:52 k7
-rw-r--r--. 1 root root          0 Nov  2 07:52 k8
-rw-r--r--. 1 root root          0 Nov  2 07:52 k9
-rw-r--r--. 1 root root       6358 Nov  2  2015 stat.log
[root@mia newname]# 

[2015-11-02 02:20:05.500755] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-tier-dht: renaming /ff3 (hash=newname-hot-dht/cache=newname-cold-dht) => /k1 (hash=newname-hot-dht/cache=newname-hot-dht)
[2015-11-02 02:20:05.505746] E [MSGID: 108008] [afr-transaction.c:1981:afr_transaction] 0-newname-replicate-1: Failing SETATTR on gfid d423e54f-85cc-4725-b495-60addde165e1: split-brain observed. [Input/output error]
[2015-11-02 02:20:05.505792] E [MSGID: 109031] [dht-linkfile.c:306:dht_linkfile_setattr_cbk] 0-newname-cold-dht: Failed to set attr uid/gid on /ff3 :<gfid:00000000-0000-0000-0000-000000000000>  [Input/output error]
[2015-11-02 02:20:05.505827] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-hot-dht: renaming /ff3 (hash=newname-replicate-2/cache=newname-replicate-2) => /k1 (hash=newname-replicate-2/cache=newname-replicate-2)
[2015-11-02 02:23:31.481425] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-tier-dht: renaming /ff1 (hash=newname-hot-dht/cache=newname-cold-dht) => /k2 (hash=newname-hot-dht/cache=newname-hot-dht)
[2015-11-02 02:23:31.485198] I [MSGID: 109066] [dht-rename.c:1411:dht_rename] 0-newname-hot-dht: renaming /ff1 (hash=newname-replicate-3/cache=newname-replicate-3) => /k2 (hash=newname-replicate-2/cache=newname-replicate-2)
[2015-11-02 02:23:31.486837] W [MSGID: 109065] [dht-rename.c:1231:dht_rename_lock_cbk] 0-newname-hot-dht: acquiring inodelk failed rename (/ff1:d0b5d1c0-ba5d-40f9-af2a-9e7fe745bf4d:newname-replicate-3 /k2:c79c0457-6285-4fd1-8235-7a9fa655c625:newname-replicate-2), returning EBUSY [Stale file handle]
[2015-11-02 02:23:31.486879] I [MSGID: 109030] [dht-rename.c:729:dht_rename_cbk] 0-newname-tier-dht: /ff1: Rename (linkto file) on newname-hot-dht failed, (gfid = d0b5d1c0-ba5d-40f9-af2a-9e7fe745bf4d)  [Stale file handle]

[2015-11-02 02:31:31.261176] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-newname-client-5: remote operation failed. Path: /k10 (72b447d2-4434-4088-9554-2b13f2cc8dd8) [No such file or directory]
[2015-11-02 02:31:31.261306] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-newname-client-4: remote operation failed. Path: /k10 (72b447d2-4434-4088-9554-2b13f2cc8dd8) [No such file or directory]



============== See logs of file ff1 and K2 as ff1 was renamed to k2======
[2015-11-02 09:30:00.679944] E [MSGID: 109037] [tier.c:1498:tier_start] 0-newname-tier-dht: Promotion failed
[2015-11-02 09:31:10.788134] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2353.00 secs
[2015-11-02 09:31:10.788177] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 13, size: 0, lookups: 76, failures: 2, skipped: 0
[2015-11-02 09:31:10.826864] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2353.00 secs
[2015-11-02 09:31:10.826898] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 13, size: 0, lookups: 76, failures: 2, skipped: 0
[2015-11-02 09:32:22.542170] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2425.00 secs
[2015-11-02 09:32:22.542215] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 24, size: 0, lookups: 87, failures: 2, skipped: 0
[2015-11-02 09:32:22.566528] I [MSGID: 109028] [dht-rebalance.c:3607:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 2425.00 secs
[2015-11-02 09:32:22.566531] I [MSGID: 109028] [dht-rebalance.c:3611:gf_defrag_status_get] 0-glusterfs: Files migrated: 24, size: 0, lookups: 87, failures: 2, skipped: 0
[2015-11-02 09:36:00.922513] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:36:00.923177] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:36:01.102854] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:38:00.126505] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:38:00.127246] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:38:00.127412] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:40:00.151145] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:40:00.151954] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:40:00.152134] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:42:00.178071] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:42:00.178843] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:42:00.179016] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:44:00.202865] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:44:00.203542] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:44:00.203674] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:46:00.227746] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:46:00.228477] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:46:00.228607] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:48:00.252896] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:48:00.253641] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:48:00.253827] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:50:00.277610] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:50:00.278400] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:50:00.278553] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:52:00.305940] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:52:00.306664] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:52:00.306812] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:54:00.330672] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:54:00.331513] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:54:00.331655] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:56:00.358181] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:56:00.358968] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:56:00.359139] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 09:58:00.382393] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 09:58:00.383448] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 09:58:00.383565] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
[2015-11-02 10:00:00.405530] W [MSGID: 109023] [dht-rebalance.c:530:__dht_rebalance_create_dst_file] 0-newname-tier-dht: /k2: failed to lookup file (Stale file handle)
[2015-11-02 10:00:00.406339] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-newname-tier-dht: ERROR -28 in current migration k2 /k2

[2015-11-02 10:00:00.406457] E [MSGID: 109037] [tier.c:1488:tier_start] 0-newname-tier-dht: Demotion failed
Comment 2 nchilaka 2015-11-03 00:53:46 EST
sosreports@ below location. Refer volume "newname"
[nchilaka@rhsqe-repo bug.1277088]$ pwd
/home/repo/sosreports/nchilaka/bug.1277088
Comment 6 RajeshReddy 2015-12-03 05:14:05 EST
Tested with build glusterfs-server-3.7.5-8, and tried both rename(move) of files in hot tier to files in the cold tier and same way rename (move) of files in the cold tier to files in hot tier and after rename operation mount shows file correctly so marking this bug as verified 

Note: If client is running lower version (glusterfs-api-3.7.5-5) rename operation is failing with device busy error
Comment 8 errata-xmlrpc 2016-03-01 00:50:00 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.