Description of problem: I stopped all the vms running on my gluster volumes and started it again. When the vms are started back all the vms came up except one and it moved to paused state. Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch How reproducible: Twice Steps to Reproduce: 1. Install HC setup 2. BootStrom windows and linux vms 3. stop all the vms running on gluster volumes. 4. start it again Actual results: one of the vm went to paused state. Expected results: vms should not go to paused state. Additional info:
sosreport-sulphur.lab.eng.blr.redhat.com.1331280-20160428131011/var/log/vdsm/vdsm.log.1.xz:libvirtEventLoop::DEBUG::2016-04-28 12:45:33,636::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 4539584670, "f9340ab8-5339-4f1e-958d-547760ef81b9": {"status": "Paused", "ioerror": {"alias": "virtio-disk0", "name": "vda", "path": "/rhev/data-center/00000001-0001-0001-0001-000000000128/297a9b9c-4396-4b30-8bfe-976a67d49a74/images/aa3b33bd-e327-4edd-967e-19afc504a401/0dcebf1c-8cdd-4b01-a2c2-beaf89515afa"}, "pauseCode": "EOTHER"}}, "jsonrpc": "2.0", "method": "|virt|VM_status|f9340ab8-5339-4f1e-958d-547760ef81b9"} [2016-04-28 07:15:33.626451] W [fuse-bridge.c:2221:fuse_readv_cbk] 0-glusterfs-fuse: 129914: READ => -1 (Invalid argument) Moving to gluster team
Is this a duplicate of Bug 1311881?
(In reply to Sahina Bose from comment #4) > Is this a duplicate of Bug 1311881? Not entirely sure. I don't think Kasturi saw the 'permission denied' errors on her setup when she reported this bug. In her case, it was the readvs that failed with EINVAL. -Krutika
Posting the progress so far on this bug: We (Kasturi and I) attached the client process to gdb and set the breakpoint at fuse_readv_cbk if (op_ret < 0) so that when it is hit we could see the backtrace and figure which translator is pumping the error. With this in place, she tried the vms stop + start cycle again but to no avail. Everything worked fine. For now, we need to get there (hit the breakpoint that is) before the issue can be debugged further.
I see the pause issue in the test environment I'm using in BAGL - gprfc085/6/7. The cluster is currently running 3.7.9-4. Whenever I run a 9 concurrent fio jobs without rate limiting, I see vm's enter a paused state. Looking at the logs on the hosts that were running the effected vm's you can see [2016-05-10 04:10:42.476785] W [dict.c:1452:dict_get_with_ref] (-->/usr/lib64/glusterfs/3.7.9/xlator/mgmt/glusterd.so(build_shd_graph+0x69) [0x7ff1a7172539] -->/lib64/libglusterfs.so.0(dict_get_str_boolean+0x32) [0x7ff1b25b1192] -->/lib64/libglusterfs.so.0(+0x1ed56) [0x7ff1b25aed56] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument] [2016-05-10 04:10:42.480382] W [dict.c:1452:dict_get_with_ref] (-->/usr/lib64/glusterfs/3.7.9/xlator/mgmt/glusterd.so(build_shd_graph+0x69) [0x7ff1a7172539] -->/lib64/libglusterfs.so.0(dict_get_str_boolean+0x32) [0x7ff1b25b1192] -->/lib64/libglusterfs.so.0(+0x1ed56) [0x7ff1b25aed56] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument] [2016-05-10 04:20:18.444527] W [dict.c:1452:dict_get_with_ref] (-->/usr/lib64/glusterfs/3.7.9/xlator/mgmt/glusterd.so(build_shd_graph+0x69) [0x7ff1a7172539] -->/lib64/libglusterfs.so.0(dict_get_str_boolean+0x32) [0x7ff1b25b1192] -->/lib64/libglusterfs.so.0(+0x1ed56) [0x7ff1b25aed56] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument] [2016-05-10 04:20:18.447965] W [dict.c:1452:dict_get_with_ref] (-->/usr/lib64/glusterfs/3.7.9/xlator/mgmt/glusterd.so(build_shd_graph+0x69) [0x7ff1a7172539] -->/lib64/libglusterfs.so.0(dict_get_str_boolean+0x32) [0x7ff1b25b1192] -->/lib64/libglusterfs.so.0(+0x1ed56) [0x7ff1b25aed56] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument] These servers are accessible to eng for further investigation. Will attach a log file from gprfc085
Created attachment 1155521 [details] log file showing errors during the vm pause events log file attached
Issue root-caused. This is a race which can result in EINVAL under the following circumstance: When two threads send fresh lookups on a shard in parallel, and they send two new inodes (I1 and I2 respectively created from call to inode_new()) in their return paths, consider the following scenario: thread 1 thread 2 ======== ======== afr gets the lookup rsp, calls inode_link(I1) in afr_lookup_sh_metadata_wrap(), gets I1. afr gets the lookup rsp, calls inode_link(I2) in afr_lookup_sh_metadata_wrap, gets I1. Yet, afr unwinds the stack with I2. DHT initialises inode ctx for I2 and unwinds the lookup to shard xl. shard calls inode_link(I2), and gets I1 in return. shard creates anon fd against I1 and sends writev/readv call on this fd. DHT fails to get the inode ctx for I1 since I1 was never the inode that was part of the unwind path of the lookup, and so it fails the fop with EINVAL. Shard as a result declares the fop a failure and propagates EINVAL up to FUSE. FUSE returns this failure to the app (qemu in this case). On encountering failure, it pauses the VM.
upstream patch posted at http://review.gluster.org/#/c/14419/ Still need confirmation from Paul etc that it fixes the pause under heavy load. Moving the BZ to POST state. -Krutika
(In reply to Paul Cuzner from comment #18) > I just tested 3.7.9-6 and am still seeing vm's pause and system wide impact > from random write workloads. Hi Paul, Sahina, Pranith and I checked your setup in gdb as the i/os are still failing as we speak. SETATTRs were failing with EROFS (read-only filesystem). All bricks are up as per volume-status. But when I checked the connection status between node 85's client and the bricks from the mount in .meta/graphs/active/vm-store-client-*/private, it seems that brick-0 (local) and brick-2 (remote) have 'connected=0' status. So effectively the mount is able to talk to only brick-1. And the rpc internal state in the process still says 'connected'. The bug is in rpc layer of GlusterFS. We will need Raghavendra G's help in this regard. In short, this issue is not related to the EINVAL fix that went in. I am going to raise a new bug to track this issue.
(In reply to Krutika Dhananjay from comment #19) > In short, this issue is not related to the EINVAL fix that went in. > I am going to raise a new bug to track this issue. To keep track of the information, Krutika has raised a new bug[1] for this issue [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1339136
Tested with RHGS 3.1.3 nightly build ( glusterfs-3.7.9-6.el7rhgs ) with the following steps 1. Created a RHEV data domain backed by gluster replica 3 sharded volume which is optimized for virt-store 2. Created around 10 VMs and ran the random 100% write workload using fio tool 3. There are no VM pause issues related to 'permission denied' or EINVAL
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240