+++ This bug was initially created as a clone of Bug #1337870 +++ +++ This bug was initially created as a clone of Bug #1337405 +++ +++ This bug was initially created as a clone of Bug #1331280 +++ Description of problem: I stopped all the vms running on my gluster volumes and started it again. When the vms are started back all the vms came up except one and it moved to paused state. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch How reproducible: Twice Steps to Reproduce: 1. Install HC setup 2. BootStrom windows and linux vms 3. stop all the vms running on gluster volumes. 4. start it again Actual results: one of the vm went to paused state. Expected results: vms should not go to paused state. Additional info: [2016-04-28 07:15:33.626451] W [fuse-bridge.c:2221:fuse_readv_cbk] 0-glusterfs-fuse: 129914: READ => -1 (Invalid argument) Moving to gluster team --- Additional comment from Krutika Dhananjay on 2016-05-17 01:20:27 EDT --- Issue root-caused. This is a race which can result in EINVAL under the following circumstance: When two threads send fresh lookups on a shard in parallel, and they send two new inodes (I1 and I2 respectively created from call to inode_new()) in their return paths, consider the following scenario: thread 1 thread 2 ======== ======== afr gets the lookup rsp, calls inode_link(I1) in afr_lookup_sh_metadata_wrap(), gets I1. afr gets the lookup rsp, calls inode_link(I2) in afr_lookup_sh_metadata_wrap, gets I1. Yet, afr unwinds the stack with I2. DHT initialises inode ctx for I2 and unwinds the lookup to shard xl. shard calls inode_link(I2), and gets I1 in return. shard creates anon fd against I1 and sends writev/readv call on this fd. DHT fails to get the inode ctx for I1 since I1 was never the inode that was part of the unwind path of the lookup, and so it fails the fop with EINVAL. Shard as a result declares the fop a failure and propagates EINVAL up to FUSE. FUSE returns this failure to the app (qemu in this case). On encountering failure, it pauses the VM. --- Additional comment from Vijay Bellur on 2016-05-19 03:03:29 EDT --- REVIEW: http://review.gluster.org/14419 (features/shard: Fix write/read failure due to EINVAL) posted (#1) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-05-19 08:39:18 EDT --- REVIEW: http://review.gluster.org/14422 (cluster/afr: Do not inode_link in afr) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Vijay Bellur on 2016-05-20 04:10:22 EDT --- REVIEW: http://review.gluster.org/14419 (features/shard: Fix write/read failure due to EINVAL) posted (#2) for review on master by Krutika Dhananjay (kdhananj) --- Additional comment from Vijay Bellur on 2016-05-20 05:55:58 EDT --- COMMIT: http://review.gluster.org/14422 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 6a51464cf4704e7d7fcbce8919a5ef386a9cfd53 Author: Pranith Kumar K <pkarampu> Date: Thu May 19 16:24:09 2016 +0530 cluster/afr: Do not inode_link in afr Race is explained at https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0 This patch also handles performing of self-heal with shd-pid. Also performs the healing with this->itable's inode rather than main itable. BUG: 1337405 Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/14422 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Krutika Dhananjay <kdhananj> --- Additional comment from Vijay Bellur on 2016-05-20 06:15:14 EDT --- REVIEW: http://review.gluster.org/14455 (cluster/afr: Do not inode_link in afr) posted (#1) for review on release-3.8 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/14456 (cluster/afr: Do not inode_link in afr) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/14456 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 606fe093b804cb133e27de2d7d21baeba4fb3944 Author: Pranith Kumar K <pkarampu> Date: Thu May 19 16:24:09 2016 +0530 cluster/afr: Do not inode_link in afr Race is explained at https://bugzilla.redhat.com/show_bug.cgi?id=1337405#c0 This patch also handles performing of self-heal with shd-pid. Also performs the healing with this->itable's inode rather than main itable. >BUG: 1337405 >Change-Id: Id657a6623b71998b027b1dff6af5bbdf8cab09c9 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/14422 >Smoke: Gluster Build System <jenkins.com> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.com> >Reviewed-by: Krutika Dhananjay <kdhananj> BUG: 1337872 Change-Id: I6d8e79a44e4cc1c5489d81f05c82510e4e90546f Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/14456 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Smoke: Gluster Build System <jenkins.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report. glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user