Created attachment 1406703 [details] logs Description of problem: With qcow2 image on RHEL7.5, live storage migration of a VM disk based on a template while there is another running VM that is based on that template fails on CopyImageError This seems to be caused by the new qemu image locking that also impacts snapshot merge operations as described in BZ 1552059 How reproducible: Always Steps to Reproduce: - Created a 4.0 DC and cluster with RHEL7.5 host - Created a NFS domain (created as v3) - Created a template - Created 2 VMs from the template as thin copy (VM image is based on the template) - created as qcow2 - Created a second storage domain (iSCSI) - Copied template disk to the second storage domain - Started both VMs - Tried to move one of the VMs disk to the second domain Tested this scenario also on 4.2 DC (v4 domain) also with RHEL7.5 and the same qemu, vdsm and libvirt packages. LSM works fine. Actual results: Disk move fails: 2018-03-10 21:33:11,567+0200 DEBUG (jsonrpc/0) [storage.TaskManager.Task] (Task='a4b1c616-164e-43b9-af6c-77e8f44d5809') moving from state init -> state preparing (task:602) 2018-03-10 21:33:11,569+0200 INFO (jsonrpc/0) [vdsm.api] START syncImageData(spUUID='79d2a575-75aa-40e2-8caf-1f768de486e3', sdUUID='34c01407-3634-41e7-96be-bd5cff15e9b9', imgUUID='5f618e7c-e724-475d-b8d5-c60439d04b68', dstSdUUID='8820dfa3-0dff-4874-b431-c70266d10cdb', syncType='INTERNAL') from=::ffff:10.35.161.118,49528, flow_id=13000ef2-ea5e-4bb6-a36e-820a193bbd89, task_id=a4b1c616-164e-43b9-af6c-77e8f44d5809 (api:46) 2018-03-10 21:33:39,986+0200 DEBUG (tasks/6) [storage.operation] FAILED: <err> = bytearray(b'qemu-img: error while writing sector 11534336: No space left on device\n'); <rc> = 1 (operation:169) 2018-03-10 21:33:39,988+0200 ERROR (tasks/6) [storage.Image] Copy image error: image=5f618e7c-e724-475d-b8d5-c60439d04b68, src domain=34c01407-3634-41e7-96be-bd5cff15e9b9, dst domain=8820dfa3-0dff-4874-b431-c702 66d10cdb (image:494) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 485, in _interImagesCopy self._run_qemuimg_operation(operation) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 113, in _run_qemuimg_operation operation.run() File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 276, in run for data in self._operation.watch(): File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 104, in watch self._finalize(b"", err) File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 178, in _finalize raise cmdutils.Error(self._cmd, rc, out, err) Error: Command ['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'qcow2', u'/rhev/data-c enter/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge4__nfs__3/34c01407-3634-41e7-96be-bd5cff15e9b9/images/5f618e7c-e724-475d-b8d5-c60439d04b68/9ed4dd18-42e9-461a-a18f-38c913ffac9b', '-O', 'qcow2', '-o', 'compat=0.10,backing_file=c6e70cde-f1b3-46e5-8867-da0427cb5c19,backing_fmt=qcow2', '/rhev/data-center/mnt/blockSD/8820dfa3-0dff-4874-b431-c70266d10cdb/images/5f618e7c-e724-475d-b8d5-c60439d04b68/ 9ed4dd18-42e9-461a-a18f-38c913ffac9b'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while writing sector 11534336: No space left on device\n') 2018-03-10 21:33:41,830+0200 ERROR (tasks/6) [storage.TaskManager.Task] (Task='a4b1c616-164e-43b9-af6c-77e8f44d5809') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1766, in syncImageData img.syncData(sdUUID, imgUUID, dstSdUUID, syncType) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 673, in syncData {'srcChain': srcChain, 'dstChain': dstChain}) File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 495, in _interImagesCopy raise se.CopyImageError() CopyImageError: low level Image copy failed: () Expected results: Live storage migration should succeed Version-Release number of selected component (if applicable): vdsm-hook-openstacknet-4.20.20-1.el7ev.noarch libvirt-daemon-driver-nwfilter-3.9.0-13.el7.x86_64 ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch sanlock-python-3.6.0-1.el7.x86_64 libvirt-daemon-driver-storage-logical-3.9.0-13.el7.x86_64 libselinux-utils-2.5-12.el7.x86_64 vdsm-yajsonrpc-4.20.20-1.el7ev.noarch qemu-kvm-rhev-2.10.0-21.el7.x86_64 vdsm-jsonrpc-4.20.20-1.el7ev.noarch libvirt-daemon-config-network-3.9.0-13.el7.x86_64 vdsm-hook-vmfex-dev-4.20.20-1.el7ev.noarch libvirt-lock-sanlock-3.9.0-13.el7.x86_64 ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch libvirt-daemon-driver-storage-mpath-3.9.0-13.el7.x86_64 ovirt-imageio-common-1.2.1-0.el7ev.noarch qemu-img-rhev-2.10.0-21.el7.x86_64 vdsm-python-4.20.20-1.el7ev.noarch selinux-policy-3.13.1-192.el7.noarch sanlock-3.6.0-1.el7.x86_64 vdsm-4.20.20-1.el7ev.x86_64 vdsm-hook-fcoe-4.20.20-1.el7ev.noarch ovirt-host-4.2.2-1.el7ev.x86_64 libnfsidmap-0.25-19.el7.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch libselinux-python-2.5-12.el7.x86_64 vdsm-common-4.20.20-1.el7ev.noarch libvirt-daemon-driver-network-3.9.0-13.el7.x86_64 libvirt-daemon-config-nwfilter-3.9.0-13.el7.x86_64 libvirt-daemon-driver-interface-3.9.0-13.el7.x86_64 libvirt-daemon-driver-lxc-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-iscsi-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-scsi-3.9.0-13.el7.x86_64 libvirt-daemon-kvm-3.9.0-13.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch python-ovirt-engine-sdk4-4.2.4-1.el7ev.x86_64 vdsm-client-4.20.20-1.el7ev.noarch selinux-policy-targeted-3.13.1-192.el7.noarch vdsm-hook-vhostmd-4.20.20-1.el7ev.noarch sanlock-lib-3.6.0-1.el7.x86_64 ovirt-provider-ovn-driver-1.2.8-1.el7ev.noarch vdsm-hook-ethtool-options-4.20.20-1.el7ev.noarch libvirt-python-3.9.0-1.el7.x86_64 qemu-guest-agent-2.8.0-2.el7.x86_64 ovirt-imageio-daemon-1.2.1-0.el7ev.noarch libvirt-daemon-3.9.0-13.el7.x86_64 libvirt-daemon-driver-nodedev-3.9.0-13.el7.x86_64 libvirt-daemon-driver-qemu-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-rbd-3.9.0-13.el7.x86_64 ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch libvirt-daemon-driver-storage-3.9.0-13.el7.x86_64 qemu-kvm-common-rhev-2.10.0-21.el7.x86_64 vdsm-http-4.20.20-1.el7ev.noarch libvirt-libs-3.9.0-13.el7.x86_64 vdsm-hook-vfio-mdev-4.20.20-1.el7ev.noarch libvirt-daemon-driver-secret-3.9.0-13.el7.x86_64 libselinux-2.5-12.el7.x86_64 cockpit-ovirt-dashboard-0.11.14-0.1.el7ev.noarch ovirt-setup-lib-1.1.4-1.el7ev.noarch libvirt-daemon-driver-storage-core-3.9.0-13.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-13.el7.x86_64 libvirt-3.9.0-13.el7.x86_64 ovirt-host-deploy-1.7.2-1.el7ev.noarch nfs-utils-1.3.0-0.54.el7.x86_64 vdsm-network-4.20.20-1.el7ev.x86_64 libvirt-client-3.9.0-13.el7.x86_64 ovirt-host-dependencies-4.2.2-1.el7ev.x86_64 libvirt-daemon-driver-storage-disk-3.9.0-13.el7.x86_64 vdsm-api-4.20.20-1.el7ev.noarch kernel 3.10.0-851.el7.x86_64 Red Hat Enterprise Linux Server 7.5 (Maipo) Additional info: # qemu-img info /rhev/data-center/79d2a575-75aa-40e2-8caf-1f768de486e3/34c01407-3634-41e7-96be-bd5cff15e9b9/images/5f618e7c-e724-475d-b8d5-c60439d04b68/096d936c-11f4-4d74-bcf4-73fc97422ced image: /rhev/data-center/79d2a575-75aa-40e2-8caf-1f768de486e3/34c01407-3634-41e7-96be-bd5cff15e9b9/images/5f618e7c-e724-475d-b8d5-c60439d04b68/096d936c-11f4-4d74-bcf4-73fc97422ced file format: qcow2 virtual size: 7.0G (7516192768 bytes) disk size: 196K cluster_size: 65536 backing file: 9ed4dd18-42e9-461a-a18f-38c913ffac9b (actual path: /rhev/data-center/79d2a575-75aa-40e2-8caf-1f768de486e3/34c01407-3634-41e7-96be-bd5cff15e9b9/images/5f618e7c-e724-475d-b8d5-c60439d04b68/9ed4dd18-42e9-461a-a18f-38c913ffac9b) backing file format: qcow2 Format specific information: compat: 0.10 refcount bits: 16 engine.log: 2018-03-10 21:33:56,496+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-22) [1f845844] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2 ,011), User admin@internal-authz have failed to move disk test_Disk1 to domain iscsi_3.
This is definitely a bug, but offhand, it does not look related to qemu's new locking mechanism. Actually, it looks similar to bug 1523614, and probably has to do with some subtle between qcow2's compat levels. Elad, let's try to isolate the problem. Can you run the same scenario on RHEL 7.4.z (with qemu 2.9.something) and double check whether it reproduces? I'm tentatively targetting this to 4.2.2 under the assumption it really is a regression. If the aforementioned requested analysis proves otherwise, we can rethink the targeting.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Created attachment 1406852 [details] rhel-7.4 The bug doesn't reproduce on RHEL7.4. 2018-03-11 13:09:57,665+0200 INFO (jsonrpc/1) [vdsm.api] FINISH syncImageData return=None from=::ffff:10.35.161.181,36978, flow_id=78d95342-cee0-4114-af41-d2b429270026, task_id=1b966efd-bb1d-4b2f-a784-817a19ad3 160 (api:52) [root@storage-ge1-vdsm1 ~]# rpm -qa |egrep 'vdsm|libvirt|qemu' qemu-kvm-tools-rhev-2.10.0-21.el7.x86_64 qemu-guest-agent-2.8.0-2.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7.x86_64 libvirt-daemon-driver-interface-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7_4.9.x86_64 vdsm-yajsonrpc-4.19.48-1.el7ev.noarch vdsm-hook-vmfex-dev-4.19.48-1.el7ev.noarch libvirt-libs-3.2.0-14.el7_4.9.x86_64 vdsm-xmlrpc-4.19.48-1.el7ev.noarch libvirt-daemon-driver-nwfilter-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-disk-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64 vdsm-cli-4.19.48-1.el7ev.noarch libvirt-daemon-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-logical-3.2.0-14.el7_4.9.x86_64 vdsm-hook-localdisk-4.19.48-1.el7ev.noarch qemu-img-rhev-2.10.0-21.el7.x86_64 vdsm-api-4.19.48-1.el7ev.noarch qemu-kvm-common-rhev-2.10.0-21.el7.x86_64 libvirt-daemon-driver-storage-core-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-qemu-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-lxc-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-scsi-3.2.0-14.el7_4.9.x86_64 vdsm-hook-ethtool-options-4.19.48-1.el7ev.noarch libvirt-3.2.0-14.el7_4.9.x86_64 vdsm-python-4.19.48-1.el7ev.noarch libvirt-daemon-driver-network-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-config-network-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-3.2.0-14.el7_4.9.x86_64 libvirt-python-3.2.0-3.el7_4.1.x86_64 libvirt-client-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-secret-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64 vdsm-jsonrpc-4.19.48-1.el7ev.noarch vdsm-4.19.48-1.el7ev.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.9.x86_64 libvirt-daemon-driver-storage-mpath-3.2.0-14.el7_4.9.x86_64 libvirt-lock-sanlock-3.2.0-14.el7_4.9.x86_64 [root@storage-ge1-vdsm1 ~]# cat /etc/os-release PRETTY_NAME="Red Hat Enterprise Linux Server 7.4 (Maipo)" [root@storage-ge1-vdsm1 ~]# uname -a Linux storage-ge1-vdsm1.scl.lab.tlv.redhat.com 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
It appears to be a similar bug to https://bugzilla.redhat.com/show_bug.cgi?id=1523614 The case there is that we have a snapshot and then we extend the disk, which results in having a child snapshot bigger then its parent. The case here is that we extend the VM disk, which results in a child image bigger than its parent (the template's image)
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Benny,can you please add some doctext explaining the situation and what a user can be done to overcome it?
Verified with the following code: ---------------------------------------- ovirt-engine-4.2.2.6-0.1.el7.noarch vdsm-4.20.23-1.el7ev.x86_64 Verified with the following scenario: ------------------------------------------- Steps to Reproduce: - Created a 4.0 DC and cluster with RHEL7.5 host - Created a NFS domain (created as v3) - Created a template - Created 2 VMs from the template as thin copy (VM image is based on the template) - created as qcow2 - Created a second storage domain (iSCSI) - Copied template disk to the second storage domain - Started both VMs - Tried to move one of the VMs disk to the second domain >>>>> disk move operation was successfull Moving to VERIFIED
Verified with the following code: ---------------------------------------- ovirt-engine-4.2.2.6-0.1.el7.noarch vdsm-4.20.23-1.el7ev.x86_64 CORRECTION TO SCENARIO Verified with the following scenario: ------------------------------------------- Steps to Reproduce: - Created a 4.0 DC and cluster with RHEL7.5 host - Created a NFS domain (created as v3) - Created a template - Created 2 VMs from the template as thin copy (VM image is based on the template) - created as qcow2 - Created a second storage domain (iSCSI) - Copied template disk to the second storage domain - Started both VMs - Extend the disk of one of the VMs (Added this step) - Tried to move the extended disk to the second domain >>>>> This operation fails as expected Moving to VERIFIED
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.