Bug 1337872 - Some of VMs go to paused state when there is concurrent I/O on vms
Summary: Some of VMs go to paused state when there is concurrent I/O on vms
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: sharding
Version: 3.7.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On: 1331280 1337405 1337870
Blocks: 1311817
TreeView+ depends on / blocked
 
Reported: 2016-05-20 10:22 UTC by Pranith Kumar K
Modified: 2016-06-28 12:18 UTC (History)
12 users (show)

Fixed In Version: glusterfs-3.7.12
Clone Of: 1337870
Environment:
Last Closed: 2016-06-28 12:18:56 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2016-05-20 10:22:01 UTC
+++ This bug was initially created as a clone of Bug #1337870 +++

+++ This bug was initially created as a clone of Bug #1337405 +++

+++ This bug was initially created as a clone of Bug #1331280 +++

Description of problem:
I stopped all the vms running on my gluster volumes and started it again. When the vms are started back all the vms came up except one and it moved to paused state.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch
libgovirt-0.3.3-1.el7_2.1.x86_64
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-vmconsole-1.0.0-1.el7ev.noarch
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch


How reproducible:
Twice

Steps to Reproduce:
1. Install HC setup
2. BootStrom windows and linux vms
3. stop all the vms running on gluster volumes.
4. start it again

Actual results:
one of the vm went to paused state.

Expected results:
vms should not go to paused state.

Additional info:

[2016-04-28 07:15:33.626451] W [fuse-bridge.c:2221:fuse_readv_cbk] 0-glusterfs-fuse: 129914: READ => -1 (Invalid argument)

Moving to gluster team


--- Additional comment from Krutika Dhananjay on 2016-05-17 01:20:27 EDT ---

Issue root-caused.

This is a race which can result in EINVAL under the following circumstance:

When two threads send fresh lookups on a shard in parallel, and they send two new inodes (I1 and I2 respectively created from call to inode_new()) in their return paths, consider the following scenario:

thread 1                                    thread 2
========                                    ========
afr gets the lookup rsp,
calls inode_link(I1) in
afr_lookup_sh_metadata_wrap(),
gets I1.

                                           afr gets the lookup rsp, calls
                                           inode_link(I2) in
                                           afr_lookup_sh_metadata_wrap,
                                           gets I1.

                                           Yet, afr unwinds the stack with I2.

                                           DHT initialises inode ctx for I2
                                           and unwinds the lookup to shard xl.

                                           shard calls inode_link(I2), and
                                           gets I1 in return.

                                           shard creates anon fd against I1 and sends
                                           writev/readv call on this fd.

                                           DHT fails to get the inode ctx for I1 since
                                           I1 was never the inode that was part of the unwind
                                           path of the lookup, and so it fails the fop with
                                           EINVAL.

                                           Shard as a result declares the fop a failure and
                                           propagates EINVAL up to FUSE.

                                           FUSE returns this failure to the app (qemu in this
                                           case). On encountering failure, it pauses the VM.

--- Additional comment from Vijay Bellur on 2016-05-19 03:03:29 EDT ---

REVIEW: http://review.gluster.org/14419 (features/shard: Fix write/read failure due to EINVAL) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Vijay Bellur on 2016-05-19 08:39:18 EDT ---

REVIEW: http://review.gluster.org/14422 (cluster/afr: Do not inode_link in afr) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Vijay Bellur on 2016-05-20 04:10:22 EDT ---

REVIEW: http://review.gluster.org/14419 (features/shard: Fix write/read failure due to EINVAL) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

--- Additional comment from Vijay Bellur on 2016-05-20 05:55:58 EDT ---

COMMIT: http://review.gluster.org/14422 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 6a51464cf4704e7d7fcbce8919a5ef386a9cfd53
Author: Pranith Kumar K <pkarampu>
Date:   Thu May 19 16:24:09 2016 +0530

    cluster/afr: Do not inode_link in afr
    
    Race is explained at
    https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0
    
    This patch also handles performing of self-heal with shd-pid.
    Also performs the healing with this->itable's inode rather than
    main itable.
    
    BUG: 1337405
    Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14422
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Krutika Dhananjay <kdhananj>

--- Additional comment from Vijay Bellur on 2016-05-20 06:15:14 EDT ---

REVIEW: http://review.gluster.org/14455 (cluster/afr: Do not inode_link in afr) posted (#1) for review on release-3.8 by Pranith Kumar Karampuri (pkarampu)

Comment 1 Vijay Bellur 2016-05-20 10:31:10 UTC
REVIEW: http://review.gluster.org/14456 (cluster/afr: Do not inode_link in afr) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)

Comment 2 Vijay Bellur 2016-05-26 12:54:28 UTC
COMMIT: http://review.gluster.org/14456 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 606fe093b804cb133e27de2d7d21baeba4fb3944
Author: Pranith Kumar K <pkarampu>
Date:   Thu May 19 16:24:09 2016 +0530

    cluster/afr: Do not inode_link in afr
    
    Race is explained at
    https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0
    
    This patch also handles performing of self-heal with shd-pid.
    Also performs the healing with this->itable's inode rather than
    main itable.
    
     >BUG: 1337405
     >Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9
     >Signed-off-by: Pranith Kumar K <pkarampu>
     >Reviewed-on: http://review.gluster.org/14422
     >Smoke: Gluster Build System <jenkins.com>
     >NetBSD-regression: NetBSD Build System <jenkins.org>
     >CentOS-regression: Gluster Build System <jenkins.com>
     >Reviewed-by: Krutika Dhananjay <kdhananj>
    
    BUG: 1337872
    Change-Id: I6d8e79a44e4cc1c5489d81f05c82510e4e90546f
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14456
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Smoke: Gluster Build System <jenkins.com>

Comment 3 Kaushal 2016-06-28 12:18:56 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.