Bug 1408712

Summary:	with granular-entry-self-heal enabled i see that there is a gfid mismatch and vm goes to paused state after migrating to another host
Product:	[Community] GlusterFS	Reporter:	Krutika Dhananjay <kdhananj>
Component:	replicate	Assignee:	Krutika Dhananjay <kdhananj>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	amukherj, bugs, knarra, ksandha, nchilaka, rcyriac, rhinduja, rhs-bugs, sasundar, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.10.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1408426
Clones:	1408785 1408786 (view as bug list)		Environment:
Last Closed:	2017-03-06 17:40:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1408426
Bug Blocks:	1400057, 1408785, 1408786

Description Krutika Dhananjay 2016-12-26 14:57:40 UTC

+++ This bug was initially created as a clone of Bug #1408426 +++

Description of problem:
vm creation happens when one of the data brick is down and once the brick is brought up back i see that there are some entries which does not get healed and when the vm is migrated to another node it goes to paused state by logging the following errors in the mount logs.

[2016-12-23 09:14:16.481519] W [MSGID: 108008] [afr-self-heal-name.c:369:afr_selfheal_name_gfid_mismatch_check] 0-engine-replicate-0: GFID mismatch for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806>/f735902d-12fa-4e4d-88c9-1b8ba06e3063.1673 6e17b733-b8a4-4563-bc3d-f659c9a46c2a on engine-client-1 and 55648f43-7e09-4e62-b7d2-16fe1ff7b23e on engine-client-0
[2016-12-23 09:14:16.482442] E [MSGID: 133010] [shard.c:1582:shard_common_lookup_shards_cbk] 0-engine-shard: Lookup on shard 1673 failed. Base file gfid = f735902d-12fa-4e4d-88c9-1b8ba06e3063 [Input/output error]
[2016-12-23 09:14:16.482474] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 11280842: READ => -1 gfid=f735902d-12fa-4e4d-88c9-1b8ba06e3063 fd=0x7faeda380210 (Input/output error)
[2016-12-23 10:08:41.956330] W [MSGID: 108008] [afr-self-heal-name.c:369:afr_selfheal_name_gfid_mismatch_check] 0-engine-replicate-0: GFID mismatch for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806>/f735902d-12fa-4e4d-88c9-1b8ba06e3063.1673 6e17b733-b8a4-4563-bc3d-f659c9a46c2a on engine-client-1 and 55648f43-7e09-4e62-b7d2-16fe1ff7b23e on engine-client-0
[2016-12-23 10:08:41.957422] E [MSGID: 133010] [shard.c:1582:shard_common_lookup_shards_cbk] 0-engine-shard: Lookup on shard 1673 failed. Base file gfid = f735902d-12fa-4e4d-88c9-1b8ba06e3063 [Input/output error]
[2016-12-23 10:08:41.957444] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 11427307: READ => -1 gfid=f735902d-12fa-4e4d-88c9-1b8ba06e3063 fd=0x7faeda380328 (Input/output error)
[2016-12-23 10:45:10.609600] W [MSGID: 108008] [afr-self-heal-name.c:369:afr_selfheal_name_gfid_mismatch_check] 0-engine-replicate-0: GFID mismatch for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806>/f735902d-12fa-4e4d-88c9-1b8ba06e3063.1673 6e17b733-b8a4-4563-bc3d-f659c9a46c2a on engine-client-1 and 55648f43-7e09-4e62-b7d2-16fe1ff7b23e on engine-client-0
[2016-12-23 10:45:10.610550] E [MSGID: 133010] [shard.c:1582:shard_common_lookup_shards_cbk] 0-engine-shard: Lookup on shard 1673 failed. Base file gfid = f735902d-12fa-4e4d-88c9-1b8ba06e3063 [Input/output error]
[2016-12-23 10:45:10.610574] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 11526955: READ => -1 gfid=f735902d-12fa-4e4d-88c9-1b8ba06e3063 fd=0x7faeda380184 (Input/output error)


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-9.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install HC with three nodes.
2. Create a arbiter volume and enable all the options using gdeploy.
3. Now bring down the first brick in the arbiter volume and create vm.
4. Once the vm creation is completed, bring back the brick and wait for self heal to happen.
5. Now migrate the vm to another host.

Actual results:
There are two issues which i have seen.
1) There are still some entries present in the node which are not healed even after a long time
2) And once the vm is migrated i see that vm goes to paused state.

Expected results:
Vm should not go to paused state after migration plus there should not be any entries present in volume heal info.

Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-12-23 05:56:11 EST ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from RamaKasturi on 2016-12-23 05:59:44 EST ---

As suggested by pranith i disabled granluar entry self heal on the volume and i do not see the issue


--- Additional comment from Krutika Dhananjay on 2016-12-26 05:41:04 EST ---

Resuming from https://bugzilla.redhat.com/show_bug.cgi?id=1400057#c11 to explain why there would be a gfid mismatch. So please go through https://bugzilla.redhat.com/show_bug.cgi?id=1400057#c11 first.

... the pending xattrs on .shard are at this point erased. Now when the brick that was down comes back online, another MKNOD on this shard's name triggered by shard readv fop, whenever it happens, would cause the fop to give EEXIST from the bricks that were already online; and on the brick that was previously offline, the creation of this shard would succeed, although with a new gfid. This leads to the gfid mismatch.

Comment 1 Worker Ant 2016-12-26 17:15:12 UTC

REVIEW: http://review.gluster.org/16286 (cluster/afr: Fix missing name indices due to EEXIST error) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 2 Worker Ant 2016-12-27 06:34:58 UTC

REVIEW: http://review.gluster.org/16286 (cluster/afr: Fix missing name indices due to EEXIST error) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

Comment 3 Worker Ant 2016-12-27 11:53:08 UTC

COMMIT: http://review.gluster.org/16286 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit da5ece887c218a7c572a1c25925a178dbd08d464
Author: Krutika Dhananjay <kdhananj>
Date:   Mon Dec 26 21:08:03 2016 +0530

    cluster/afr: Fix missing name indices due to EEXIST error
    
    PROBLEM:
    Consider a volume with  granular-entry-heal and sharding enabled. When
    a replica is down and a shard is created as part of a write, the name
    index is correctly created under indices/entry-changes/<dot-shard-gfid>.
    Now when a read on the same region triggers another MKNOD, the fop
    fails on the online bricks with EEXIST. By virtue of this being a
    symmetric error, the failed_subvols[] array is reset to all zeroes.
    Because of this, before post-op, the GF_XATTROP_ENTRY_OUT_KEY will be
    set, causing the name index, which was created in the previous MKNOD
    operation, to be wrongly deleted in THIS MKNOD operation.
    
    FIX:
    The ideal fix would have been for a transaction to delete the name
    index ONLY if it knows it is the one that created the index in the first
    place. This would involve gathering information as to whether THIS xattrop
    created the index from individual bricks, aggregating their responses and
    based on the various posisble combinations of responses, decide whether to
    delete the index or not. This is rather complex. Simpler fix would be
    for post-op to examine local->op_ret in the event of no failed_subvols
    to figure out whether to delete the name index or not. This can occasionally
    lead to creation of stale name indices but they won't be affecting the IO path
    or mess with pending changelogs in any way and self-heal in its crawl of
    "entry-changes" directory would take care to delete such indices.
    
    Change-Id: Ic1b5257f4dc9c20cb740a866b9598cf785a1affa
    BUG: 1408712
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/16286
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 4 Shyamsundar 2017-03-06 17:40:49 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/