Description of problem: VM is unable to be started after a snapshot of a disk on a gluster volume (libgfapi enabled) was created. Version-Release number of selected component (if applicable): oVirt 4.3.10.4-1.el7 # rpm -qa gluster* gluster-ansible-cluster-1.0.0-1.el7.noarch glusterfs-events-6.9-1.el7.x86_64 gluster-ansible-maintenance-1.0.1-2.el7.noarch glusterfs-client-xlators-6.9-1.el7.x86_64 glusterfs-6.9-1.el7.x86_64 glusterfs-cli-6.9-1.el7.x86_64 glusterfs-api-6.9-1.el7.x86_64 glusterfs-fuse-6.9-1.el7.x86_64 glusterfs-server-6.9-1.el7.x86_64 gluster-ansible-repositories-1.0.1-2.el7.noarch gluster-ansible-features-1.0.5-6.el7.noarch glusterfs-geo-replication-6.9-1.el7.x86_64 glusterfs-libs-6.9-1.el7.x86_64 gluster-ansible-infra-1.0.4-8.el7.noarch glusterfs-rdma-6.9-1.el7.x86_64 gluster-ansible-roles-1.0.5-10.el7.noarch How reproducible: Always Steps to Reproduce: 1. Create a VM on a libgfapi enabled gluster volume 2. Stop VM 3. Create a disk snapshot 4. Try to start VM Actual results: VM fails to start. Expected results: VM starts normally. Additional info: vdsm.log: 2020-07-09 15:02:08,971+0200 ERROR (vm/fdca3bae) [virt.vm] (vmId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') The vm start process failed (vm:934) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 868, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2895, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: internal error: qemu unexpectedly closed the monitor: [2020-07-09 13:02:06.221697] E [MSGID: 108006] [afr-common.c:5325:__afr_handle_child_down_event] 0-vmstore-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2020-07-09 13:02:06.222023] E [MSGID: 108006] [afr-common.c:5325:__afr_handle_child_down_event] 0-vmstore-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up. [2020-07-09 13:02:06.222316] I [io-stats.c:4027:fini] 0-vmstore: io-stats translator unloaded 2020-07-09T13:02:08.000577Z qemu-kvm: -drive file=gluster://192.168.0.21:24007/vmstore/d47ccd1a-9750-4012-938e-3d93e81ad9e4/images/c267bd3d-ecc8-40d1-9bbb-31597c253815/86de26db-3d15-425e-ab9f-aae397b01464,file.debug=4,format=qcow2,if=none,id=drive-ua-c267bd3d-ecc8-40d1-9bbb-31597c253815,serial=c267bd3d-ecc8-40d1-9bbb-31597c253815,werror=stop,rerror=stop,cache=none,aio=native: Could not read L1 table: Input/output error 2020-07-09 15:02:08,972+0200 INFO (vm/fdca3bae) [virt.vm] (vmId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') Changed state to Down: internal error: qemu unexpectedly closed the monitor: [2020-07-09 13:02:06.221697] E [MSGID: 108006] [afr-common.c:5325:__afr_handle_child_down_event] 0-vmstore-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2020-07-09 13:02:06.222023] E [MSGID: 108006] [afr-common.c:5325:__afr_handle_child_down_event] 0-vmstore-replicate-1: All subvolumes are down. Going offline until at least one of them comes back up. [2020-07-09 13:02:06.222316] I [io-stats.c:4027:fini] 0-vmstore: io-stats translator unloaded 2020-07-09T13:02:08.000577Z qemu-kvm: -drive file=gluster://192.168.0.21:24007/vmstore/d47ccd1a-9750-4012-938e-3d93e81ad9e4/images/c267bd3d-ecc8-40d1-9bbb-31597c253815/86de26db-3d15-425e-ab9f-aae397b01464,file.debug=4,format=qcow2,if=none,id=drive-ua-c267bd3d-ecc8-40d1-9bbb-31597c253815,serial=c267bd3d-ecc8-40d1-9bbb-31597c253815,werror=stop,rerror=stop,cache=none,aio=native: Could not read L1 table: Input/output error (code=1) (vm:1702) 2020-07-09 15:02:08,978+0200 INFO (vm/fdca3bae) [virt.vm] (vmId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') Stopping connection (guestagent:455) 2020-07-09 15:02:08,984+0200 INFO (jsonrpc/7) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:10.10.0.20,60536, vmId=fdca3bae-28c4-40da-b73b-eabf3cacd1fc (api:48) 2020-07-09 15:02:08,984+0200 INFO (jsonrpc/7) [virt.vm] (vmId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') Release VM resources (vm:5186) 2020-07-09 15:02:08,984+0200 WARN (jsonrpc/7) [virt.vm] (vmId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') trying to set state to Powering down when already Down (vm:626) 2020-07-09 15:02:08,984+0200 INFO (jsonrpc/7) [virt.vm] (vmId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') Stopping connection (guestagent:455) 2020-07-09 15:02:08,985+0200 INFO (jsonrpc/7) [vdsm.api] START teardownImage(sdUUID='d47ccd1a-9750-4012-938e-3d93e81ad9e4', spUUID='cbd35eda-9ab9-11ea-a226-00163e399c8f', imgUUID='c267bd3d-ecc8-40d1-9bbb-31597c253815', volUUID=None) from=::ffff:10.10.0.20,60536, task_id=5913b3e6-b5b7-44c6-b34c-b5744eb133cb (api:48) 2020-07-09 15:02:08,986+0200 INFO (jsonrpc/7) [storage.StorageDomain] Removing image rundir link u'/var/run/vdsm/storage/d47ccd1a-9750-4012-938e-3d93e81ad9e4/c267bd3d-ecc8-40d1-9bbb-31597c253815' (fileSD:592) 2020-07-09 15:02:08,986+0200 INFO (jsonrpc/7) [vdsm.api] FINISH teardownImage return=None from=::ffff:10.10.0.20,60536, task_id=5913b3e6-b5b7-44c6-b34c-b5744eb133cb (api:54) 2020-07-09 15:02:08,986+0200 INFO (jsonrpc/7) [virt.vm] (vmId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') Stopping connection (guestagent:455) 2020-07-09 15:02:08,986+0200 WARN (jsonrpc/7) [root] File: /var/lib/libvirt/qemu/channels/fdca3bae-28c4-40da-b73b-eabf3cacd1fc.ovirt-guest-agent.0 already removed (fileutils:54) 2020-07-09 15:02:08,987+0200 WARN (jsonrpc/7) [root] File: /var/lib/libvirt/qemu/channels/fdca3bae-28c4-40da-b73b-eabf3cacd1fc.org.qemu.guest_agent.0 already removed (fileutils:54) 2020-07-09 15:02:08,987+0200 INFO (jsonrpc/7) [vdsm.api] START inappropriateDevices(thiefId='fdca3bae-28c4-40da-b73b-eabf3cacd1fc') from=::ffff:10.10.0.20,60536, task_id=fb813d21-6307-419e-9b1e-40784ccfc4af (api:48)
This seems to be the same problem: https://www.mail-archive.com/users@ovirt.org/msg59460.html
Same problem, on a 4.3 installation. The culprit seems to be that the new image is created with the backing file just as a filename, not the complete glusterfs url: This is created by ovirt via the web ui (renamed : > qemu-img info oldsnap image: oldsnap file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 193K cluster_size: 65536 backing file: d06c3e2b-5895-414d-902f-cba7fe44252d backing file format: raw Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false This is created by hand as a workaround, with this, the snapshotted VM can be started: >qemu-img info 41c1120b-aaf1-4eec-aa85-e800f3480241 image: 41c1120b-aaf1-4eec-aa85-e800f3480241 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 124M cluster_size: 65536 backing file: gluster://192.168.1.11:24007/ssd-01/d97c648c-d26a-4874-8715-d1ba7b4f5c74/images/44831548-c3c6-4391-b3f3-db22c83b05d1/d06c3e2b-5895-414d-902f-cba7fe44252d Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
libgfapi doesn't provide value over non-libgfapi and has some limitations. Could you please try with glusterfs-fuse enabled? I am closing this for now, please reopen if you hit issue.