Created attachment 1405076 [details] logs Description of problem: Basic backup restore API scenario of VM start while it has a snapshot disk attached, resides on a Gluster domain, fails with libvirtError: Cannot access storage file (the transient volume). Version-Release number of selected component (if applicable): vdsm-4.20.20-1.el7ev.x86_64 libvirt-3.9.0-13.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7.x86_64 rhvm-4.2.2.2-0.1.el7.noarch How reproducible: Reproduced only with Gluster. Tested also with NFS and iSCSI. Steps to Reproduce: 1. Create VM (source VM) with disk attached on Gluster domain (replica3) 2. Create a snapshot for the source VM 3. Create a second VM (backup VM) 4. Attach the backup disk snapshot of source VM to the backup VM (via REST) 5. Start backup VM Actual results: Start VM fails: 2018-03-07 01:05:25,486+0200 ERROR (vm/3a1ad46b) [virt.vm] (vmId='3a1ad46b-3854-4602-bd85-330c799eba96') The vm start process failed (vm:940) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2832, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: Cannot access storage file '/var/lib/vdsm/transient/d8930944-2305-4859-b616-03e135968ad9-534d83aa-7125-43ce-a440-58abd08a4ebf.CXRJka' (as uid:107, gid:107): No such file or directory 2018-03-07 01:05:25,488+0200 INFO (vm/3a1ad46b) [virt.vm] (vmId='3a1ad46b-3854-4602-bd85-330c799eba96') Changed state to Down: Cannot access storage file '/var/lib/vdsm/transient/d8930944-2305-4859-b616-03e1359 68ad9-534d83aa-7125-43ce-a440-58abd08a4ebf.CXRJka' (as uid:107, gid:107): No such file or directory (code=1) (vm:1677) libvirtd.log: 2018-03-06 23:05:25.409+0000: 1498: error : virStorageFileReportBrokenChain:416 : Cannot access storage file '/var/lib/vdsm/transient/d8930944-2305-4859-b616-03e135968ad9-534d83aa-7125-43ce-a440-58abd08a4ebf.CXR Jka' (as uid:107, gid:107): No such file or directory Expected results: VM start should succeed Additional info: logs
It seems like the problem is that vdsm couldn't access this file because it couldn't create it in the first place: 2018-03-07 01:05:23,423+0200 DEBUG (vm/3a1ad46b) [storage.operation] /usr/bin/taskset --cpu-list 0-0 /usr/bin/nice -n 19 /usr/bin/ionice -c 3 /usr/bin/qemu-img create -f qcow2 -o compat=1.1 -b storage_local_ge6_ volume_0/d8930944-2305-4859-b616-03e135968ad9/images/a9dc2e8a-d6ad-4c8a-aa3f-0132bab5b705/534d83aa-7125-43ce-a440-58abd08a4ebf -F qcow2 /var/lib/vdsm/transient/d8930944-2305-4859-b616-03e135968ad9-534d83aa-7125- 43ce-a440-58abd08a4ebf.CXRJka (cwd /var/lib/vdsm/transient) (operation:150) 2018-03-07 01:05:23,706+0200 DEBUG (vm/3a1ad46b) [storage.operation] FAILED: <err> = "qemu-img: /var/lib/vdsm/transient/d8930944-2305-4859-b616-03e135968ad9-534d83aa-7125-43ce-a440-58abd08a4ebf.CXRJka: Could not open '/var/lib/vdsm/transient/storage_local_ge6_volume_0/d8930944-2305-4859-b616-03e135968ad9/images/a9dc2e8a-d6ad-4c8a-aa3f-0132bab5b705/534d83aa-7125-43ce-a440-58abd08a4ebf': No such file or directory\nCould not open backing image to determine size.\n"; <rc> = 1 (operation:169) 2018-03-07 01:05:23,707+0200 INFO (vm/3a1ad46b) [virt.vm] (vmId='3a1ad46b-3854-4602-bd85-330c799eba96') Unlinking transient disk volume '/var/lib/vdsm/transient/d8930944-2305-4859-b616-03e135968ad9-534d83aa-712 5-43ce-a440-58abd08a4ebf.CXRJka' (vm:3758) 2018-03-07 01:05:23,708+0200 ERROR (vm/3a1ad46b) [virt.vm] (vmId='3a1ad46b-3854-4602-bd85-330c799eba96') Failed to create the transient disk for volume 534d83aa-7125-43ce-a440-58abd08a4ebf (vm:3761) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3755, in _createTransientDisk operation.run() File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 81, in run self._finalize(out, err) File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 178, in _finalize raise cmdutils.Error(self._cmd, rc, out, err) Error: Command ['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'create', '-f', 'qcow2', '-o', 'compat=1.1', '-b', u'storage_local_ge6_vol ume_0/d8930944-2305-4859-b616-03e135968ad9/images/a9dc2e8a-d6ad-4c8a-aa3f-0132bab5b705/534d83aa-7125-43ce-a440-58abd08a4ebf', '-F', 'qcow2', '/var/lib/vdsm/transient/d8930944-2305-4859-b616-03e135968ad9-534d83aa -7125-43ce-a440-58abd08a4ebf.CXRJka'] failed with rc=1 out='' err="qemu-img: /var/lib/vdsm/transient/d8930944-2305-4859-b616-03e135968ad9-534d83aa-7125-43ce-a440-58abd08a4ebf.CXRJka: Could not open '/var/lib/vds m/transient/storage_local_ge6_volume_0/d8930944-2305-4859-b616-03e135968ad9/images/a9dc2e8a-d6ad-4c8a-aa3f-0132bab5b705/534d83aa-7125-43ce-a440-58abd08a4ebf': No such file or directory\nCould not open backing im age to determine size.\n" This may be related to the new qemu version in rhel 7.5. Are you using rhel 7.5? Also, when you tested NFS and iSCSI, did you use the same host (same qemu version)?
Created attachment 1405356 [details] vdsm.log-iscsi (In reply to Idan Shaby from comment #1) > > This may be related to the new qemu version in rhel 7.5. > Are you using rhel 7.5? Yes > Also, when you tested NFS and iSCSI, did you use the same host (same qemu > version)? This was found in an automation run, in which the test is being executed on iSCSI, NFS and Gluster in a sequence using the same environment. All environment hosts use the same qemu version - qemu-kvm-rhev-2.10.0-21.el7.x86_64 In the attached logs, here is the same case executed on iSCSI (attach snapshot disk to backup VM): 2018-03-07 00:57:25,404+02 INFO [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-7) [diskattachments_create_f0ff61b3-1394] Running command: AttachDiskToVmCommand internal: false. Entities affected : ID: 763f7609-1eac-413a-8aca-bf24eb004707 Type: VMAction group CONFIGURE_VM_STORAGE with role type USER, ID: 69cc6dea-d017-4a21-97d7-d92763983812 Type: DiskAction group ATTACH_DISK with role type USER vdsm.log (from another host, vdsm.log is attached): 2018-03-07 00:57:28,324+0200 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.create succeeded in 0.05 seconds (__init__:573) Gluster volume details: Volume Name: storage_local_ge6_volume_0 Type: Replicate Volume ID: 4a33039e-f131-4315-a74b-0d7ac7664f03 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster01.scl.lab.tlv.redhat.com:/gluster_volumes/storage_local_ge6_volume_0 Brick2: gluster02.scl.lab.tlv.redhat.com:/gluster_volumes/storage_local_ge6_volume_0 Brick3: gluster03.scl.lab.tlv.redhat.com:/gluster_volumes/storage_local_ge6_volume_0 Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet storage.owner-gid: 36 storage.owner-uid: 36 Packages on hosts: vdsm-yajsonrpc-4.20.20-1.el7ev.noarch vdsm-http-4.20.20-1.el7ev.noarch vdsm-network-4.20.20-1.el7ev.x86_64 vdsm-4.20.20-1.el7ev.x86_64 vdsm-hook-vfio-mdev-4.20.20-1.el7ev.noarch vdsm-python-4.20.20-1.el7ev.noarch vdsm-client-4.20.20-1.el7ev.noarch vdsm-api-4.20.20-1.el7ev.noarch vdsm-hook-ethtool-options-4.20.20-1.el7ev.noarch vdsm-hook-openstacknet-4.20.20-1.el7ev.noarch vdsm-common-4.20.20-1.el7ev.noarch vdsm-hook-vmfex-dev-4.20.20-1.el7ev.noarch vdsm-hook-fcoe-4.20.20-1.el7ev.noarch vdsm-hook-vhostmd-4.20.20-1.el7ev.noarch vdsm-jsonrpc-4.20.20-1.el7ev.noarch ovirt-setup-lib-1.1.4-1.el7ev.noarch cockpit-ovirt-dashboard-0.11.14-0.1.el7ev.noarch ovirt-imageio-common-1.2.1-0.el7ev.noarch ovirt-host-dependencies-4.2.2-1.el7ev.x86_64 ovirt-provider-ovn-driver-1.2.8-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-host-deploy-1.7.2-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch ovirt-imageio-daemon-1.2.1-0.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch python-ovirt-engine-sdk4-4.2.4-1.el7ev.x86_64 ovirt-host-4.2.2-1.el7ev.x86_64 ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch libvirt-libs-3.9.0-13.el7.x86_64 libvirt-daemon-driver-secret-3.9.0-13.el7.x86_64 libvirt-daemon-driver-qemu-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-3.9.0-13.el7.x86_64 libvirt-client-3.9.0-13.el7.x86_64 libvirt-lock-sanlock-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-logical-3.9.0-13.el7.x86_64 libvirt-daemon-kvm-3.9.0-13.el7.x86_64 libvirt-daemon-driver-network-3.9.0-13.el7.x86_64 libvirt-daemon-config-nwfilter-3.9.0-13.el7.x86_64 libvirt-daemon-driver-interface-3.9.0-13.el7.x86_64 libvirt-daemon-driver-lxc-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-disk-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-mpath-3.9.0-13.el7.x86_64 libvirt-python-3.9.0-1.el7.x86_64 libvirt-daemon-3.9.0-13.el7.x86_64 libvirt-daemon-driver-nodedev-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-iscsi-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-scsi-3.9.0-13.el7.x86_64 libvirt-daemon-driver-nwfilter-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-rbd-3.9.0-13.el7.x86_64 libvirt-daemon-config-network-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-core-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-13.el7.x86_64 libvirt-3.9.0-13.el7.x86_64 libvirt-daemon-driver-qemu-3.9.0-13.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7.x86_64 qemu-guest-agent-2.8.0-2.el7.x86_64 qemu-img-rhev-2.10.0-21.el7.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch qemu-kvm-common-rhev-2.10.0-21.el7.x86_64 glusterfs-libs-3.8.4-54.el7.x86_64 glusterfs-cli-3.8.4-54.el7.x86_64 glusterfs-client-xlators-3.8.4-54.el7.x86_64 glusterfs-fuse-3.8.4-54.el7.x86_64 glusterfs-3.8.4-54.el7.x86_64 glusterfs-api-3.8.4-54.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-13.el7.x86_64 glusterfs-rdma-3.8.4-54.el7.x86_64 sanlock-python-3.6.0-1.el7.x86_64 libvirt-lock-sanlock-3.9.0-13.el7.x86_64 sanlock-3.6.0-1.el7.x86_64 Asanlock-lib-3.6.0-1.el7.x86_64 libselinux-utils-2.5-12.el7.x86_64 selinux-policy-3.13.1-192.el7.noarch uselinux-policy-targeted-3.13.1-192.el7.noarch nlibselinux-python-2.5-12.el7.x86_64 libselinux-2.5-12.el7.x86_64 kernel 3.10.0-858.el7.x86_64 Red Hat Enterprise Linux Server 7.5 (Maipo)
Thanks for the details, Elad. I saw that you're using libgfapi. Can you please try to run the same test without it and see if it works?
Created attachment 1405766 [details] w/o libgfsapi Doesn't reproduce while libgfapi is disabled. 2018-03-08 10:48:55,271+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-21) [diskattachments_create_66e05680-82c4] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk sour ce_vm_TestCase6183_0810475513_Disk_0 was successfully attached to VM backup_vm_TestCase6183_0810483573 by admin@internal-authz. 2018-03-08 10:48:59,308+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-27) [vms_syncAction_ad7fb3ed-2ed0-4764] EVENT_ID: USER_STARTED_VM(153), VM backup_vm_TestCase 6183_0810483573 was started by admin@internal-authz (Host: host_mixed_1). 2018-03-08 10:50:10,816+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-10) [] VM 'ace84102-718f-43d0-88be-ab3e6751e0be'(backup_vm_TestCase6183_08 10483573) moved from 'PoweringUp' --> 'Up' 2018-03-08 10:48:59,214+0200 INFO (jsonrpc/6) [api.virt] START create 2018-03-08 10:49:02,490+0200 INFO (vm/ace84102) [virt.vm] (vmId='ace84102-718f-43d0-88be-ab3e6751e0be') <?xml version='1.0' encoding='utf-8'?> RHEL 7.5 rhvm-4.2.2.2-0.1.el7.noarch vdsm-4.20.20-1.el7ev.x86_64 libvirt-3.9.0-13.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7.x86_64 [root@storage-ge-04 ~]# engine-config -a | grep -i libgfapi LibgfApiSupported: false version: 3.6 LibgfApiSupported: false version: 4.0 LibgfApiSupported: false version: 4.1 LibgfApiSupported: false version: 4.2
as per comment 4, pushing out to 4.3, as we currently don't use libgfapi.
For later inspection - what we should basically do is create the transient disk (with qemu-img create) using the -u option (unsafe) to avoid checking if the backing file exists when using libgfapi.
(In reply to Idan Shaby from comment #6) > For later inspection - what we should basically do is create the transient > disk (with qemu-img create) using the -u option (unsafe) to avoid checking > if the backing file exists when using libgfapi. This sounds necessary, but probably not sufficient - I wonder if the snapshot is usable by the vm once you pass the qemu-img create blocker.
We currently don't use libgfapi in 4.3, pushing out to 4.4
Does not seem like a regression to me if the support was not introduced. Removing Regression flag to unblock 4.4 Beta. Feel free to re-add it if you believe this is indeed regression.
This bug/RFE is more than 2 years old and it didn't get enough attention so far, and is now flagged as pending close. Please review if it is still relevant and provide additional details/justification/patches if you believe it should get more attention for the next oVirt release.
It would be nice to have libgfapi in oVirt. this bug was one of the blockers if I remember well. users on users have reported seeing huge IO performance increase with libgfapi over fuse.
unfortunately ligfapi is not really planned in foreseeable future. This bug is 3,5 years old and stale, closing it. If you feel ligfapi still should be added please bring it up as a feature, with a plan to contribute the code and testing. Unfortunately it won't happen otherwise as the oVirt development team does not plan to add it.