Hide Forgot
+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1589612 +++ ====================================================================== +++ This bug was initially created as a clone of Bug #1579909 +++ Description of problem: Ovirt environment: 7x hosts (4.1). After upgrading the engine and one of the hosts I am unable to start a VM on this new host. Starting the VM on one of the older 4.1 works fine. The error received in vdsm is as follows: 2018-05-17 14:24:45,561+0200 ERROR (vm/0d53dd5d) [virt.vm] (vmId='0d53dd5d-ef16-4763-bbdc-2dc173087bf5') The vm start process failed (vm:943) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2882, in _run self._domDependentInit() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2458, in _domDependentInit self._vmDependentInit() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2495, in _vmDependentInit self._sync_metadata() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5158, in _sync_metadata self._md_desc.dump(self._dom) File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 509, in dump md_xml = self._build_xml() File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 721, in _build_xml md_elem = self._build_tree(namespace, namespace_uri) File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 711, in _build_tree dev_elem = _dump_device(metadata_obj, data) File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 800, in _dump_device elems.append(_dump_device_spec_params(md_obj, value)) File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 866, in _dump_device_spec_params spec_params_elem = md_obj.dump(_SPEC_PARAMS, **value) File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 229, in dump _keyvalue_to_elem(self._add_ns(key), value, elem) File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 916, in _keyvalue_to_elem raise UnsupportedType(key, value) UnsupportedType: Unsupported {u'write_bytes_sec': 0, u'total_iops_sec': 0, u'read_iops_sec': 100, u'read_bytes_sec': 0, u'write_iops_sec': 100, u'total_bytes_sec': 0} for ioTune the logged VM kvm XML contains this: <disk device="disk" snapshot="no" type="block"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"/> <source dev="/rhev/data-center/mnt/blockSD/252fe066-1a76-4c18-8ea3-29d3a07cdd4c/images/f617d674-e283-4e70-abb2-2ca2b8cb2bce/301f06aa-053c-48c8-8e25-f259d68b3395"/> <target bus="virtio" dev="vda"/> <serial>f617d674-e283-4e70-abb2-2ca2b8cb2bce</serial> <boot order="1"/> <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/> <iotune> <read_bytes_sec>0</read_bytes_sec> <read_iops_sec>100</read_iops_sec> <total_bytes_sec>0</total_bytes_sec> <total_iops_sec>0</total_iops_sec> <write_bytes_sec>0</write_bytes_sec> <write_iops_sec>100</write_iops_sec> </iotune> </disk> Version-Release number of selected component (if applicable): - vdsm - Version : 4.20.27.1 - centos 7.5 - qemu-kvm-ev Version : 2.10.0 Release : 21.el7_5.2.1 How reproducible: Steps to Reproduce: 1. have 4.1 cluster with qos IOPS limits 2. upgrade engine and host to 4.2 3. try to run a VM Actual results: VM does not run Expected results: VM should start Additional info: --- Additional comment from Michal Skrivanek on 2018-05-19 02:03:57 EDT --- From the original email thread: Also, please make sure to report the steps you did to get this error. Are you just starting a new VM using 4.2 Engine and 4.2 host? Or are you migrating an old VM created with Engine 4.1? Just for the sake of completeness (not sure it applies here), this flow is NOT supported: 1. have a VM happily run on 4.1 host 2. upgrade Vdsm on that host from 4.1 to 4.2 while the VM is running 3. restart Vdsm --- Additional comment from Michal Skrivanek on 2018-05-19 02:04:52 EDT --- Please also attach engine.log --- Additional comment from on 2018-05-21 03:52 EDT --- --- Additional comment from on 2018-05-21 03:52 EDT --- --- Additional comment from on 2018-05-21 03:53 EDT --- --- Additional comment from on 2018-05-21 03:55:15 EDT --- I just created a new VM on my 4.2 cluster. It's starting OK when the disk profile has no IO limits. Once I select another disk profile with limits, the VM does not start. I'm attaching the engine log from the failed start and also 2 XMLs from the log, one with a IO limit disk profile which does not start and the other without IO limits, which starts OK. The diff is plain enough: diff --git a/bug_bad.xml b/bug_ok.xml index 40d2689..3aad896 100644 --- a/bug_bad.xml +++ b/bug_ok.xml @@ -98,7 +98,6 @@ <alias name="ua-e94f8c97-bb7a-4dbb-a1a1-1470d70250aa"/> <address bus="0" controller="0" target="0" type="drive" unit="0"/> <serial>e94f8c97-bb7a-4dbb-a1a1-1470d70250aa</serial> - <iotune read_bytes_sec="0" read_iops_sec="300" total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0" write_iops_sec="300"/> </disk> </devices> <pm> PS: on a 4.1 host both VMs start. --- Additional comment from Francesco Romani on 2018-05-21 08:25:19 EDT --- Vdsm bug, unwanted side effect of patch https://gerrit.ovirt.org/#/c/90435/ --- Additional comment from Francesco Romani on 2018-05-21 08:27:24 EDT --- (In reply to ernest.beinrohr from comment #6) > I just created a new VM on my 4.2 cluster. It's starting OK when the disk > profile has no IO limits. Once I select another disk profile with limits, > the VM does not start. I'm attaching the engine log from the failed start > and also 2 XMLs from the log, one with a IO limit disk profile which does > not start and the other without IO limits, which starts OK. > > The diff is plain enough: > diff --git a/bug_bad.xml b/bug_ok.xml > index 40d2689..3aad896 100644 > --- a/bug_bad.xml > +++ b/bug_ok.xml > @@ -98,7 +98,6 @@ > <alias name="ua-e94f8c97-bb7a-4dbb-a1a1-1470d70250aa"/> > <address bus="0" controller="0" target="0" type="drive" > unit="0"/> > <serial>e94f8c97-bb7a-4dbb-a1a1-1470d70250aa</serial> > - <iotune read_bytes_sec="0" read_iops_sec="300" > total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0" > write_iops_sec="300"/> > </disk> > </devices> > <pm> > > > PS: on a 4.1 host both VMs start. Please note that the XML generated from Engine looks wrong. It should look like https://libvirt.org/formatdomain.html#elementsDisks Vdsm has code to deal with the format as per libvirt docs. So, with the incoming patch Vdsm will let the Vm start, but until we also get an Engine fix, the IO tune settings will be silently discarded --- Additional comment from Arik on 2018-05-21 08:42:37 EDT --- (In reply to Francesco Romani from comment #8) > Please note that the XML generated from Engine looks wrong. It should look > like > https://libvirt.org/formatdomain.html#elementsDisks Ack, posted a fix. --- Additional comment from on 2018-05-21 08:46:43 EDT --- A new machine createn on 4.2 engine has this iotune format: <iotune read_bytes_sec="0" read_iops_sec="300" total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0" write_iops_sec="300"/> </disk> whereas the 4.1 engine generated (but also not working on 4.2 host) has this format: <iotune> <read_bytes_sec>0</read_bytes_sec> <read_iops_sec>100</read_iops_sec> <total_bytes_sec>0</total_bytes_sec> <total_iops_sec>0</total_iops_sec> <write_bytes_sec>0</write_bytes_sec> <write_iops_sec>100</write_iops_sec> </iotune> </disk> Both formats work on a 4.1 host --- Additional comment from Francesco Romani on 2018-05-21 09:47:48 EDT --- (In reply to ernest.beinrohr from comment #10) > A new machine createn on 4.2 engine has this iotune format: > <iotune read_bytes_sec="0" read_iops_sec="300" > total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0" > write_iops_sec="300"/> > </disk> > > whereas the 4.1 engine generated (but also not working on 4.2 host) has this > format: > <iotune> > <read_bytes_sec>0</read_bytes_sec> > <read_iops_sec>100</read_iops_sec> > <total_bytes_sec>0</total_bytes_sec> > <total_iops_sec>0</total_iops_sec> > <write_bytes_sec>0</write_bytes_sec> > <write_iops_sec>100</write_iops_sec> > </iotune> > </disk> > > Both formats work on a 4.1 host Hi, Thanks for the additional information, it matches our findings. I believe that patch https://gerrit.ovirt.org/#/c/91432/ should solve the issue, and, if so, will be included in the next ovirt 4.2 release. --- Additional comment from Francesco Romani on 2018-05-21 10:57:35 EDT --- Note about verification: To properly fix this bug we need both Vdsm patch and Engine patch. Vdsm patched, Engine unpatched 1. run a VM with iotune settings, it should start and work as usual 2. using "virsh -r dumpxml", or vdsm-client dumpxmls, inspect the domain XML, it should NOT have iotune settings configured. Vdsm and Engine both patched: 1. run a VM with iotune settings, it should start and work as usual 2. using "virsh -r dumpxml", or vdsm-client dumpxmls, inspect the domain XML, it should have iotune settings configured. --- Additional comment from RHV Bugzilla Automation and Verification Bot on 2018-05-24 19:53:42 EDT --- INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Open patch attached] For more info please contact: infra --- Additional comment from Sandro Bonazzola on 2018-05-25 03:31:41 EDT --- Moving back to post as per comment #13 --- Additional comment from Francesco Romani on 2018-05-25 04:15:05 EDT --- Unmerged patches are not needed to solve this issue. We may keep this bug open for a different reason, however: without Engine patch, a VM with QoS settings will start, but ignoring the aforementioned QoS settings. --- Additional comment from Francesco Romani on 2018-05-25 04:16:29 EDT --- Just noticed that Engine patch is merged, so moving back to MODIFIED https://gerrit.ovirt.org/#/c/91492/ --- Additional comment from Arik on 2018-05-27 04:51:02 EDT --- (In reply to Francesco Romani from comment #16) > Just noticed that Engine patch is merged, so moving back to MODIFIED > > https://gerrit.ovirt.org/#/c/91492/ Right, and since all fixes were merged before 4.2.4 tagging moving to ON_QA. --- Additional comment from Liran Rotenberg on 2018-06-06 07:11:34 EDT --- Verified on: ovirt-engine-4.2.4.1-0.1.el7.noarch vdsm-4.20.29-1.el7ev.x86_64 Steps of verification: 1. Deploy RHEV 4.1: ovirt-engine-4.1.11.2-0.1.el7.noarch, vdsm-4.19.51-1.el7ev.x86_64 2. Create storage QoS profile in the datacenter. 3. Set IOPS read/write - 100 and 100. Note: I created 2 QoS profiles one with IOPS and one with throughput(golden_env_mixed_virtio_0 is with IOPS, golden_env_mixed_virtio_1 is with throughput). 4. Create a VM, set the QoS profile on the disk VM and start VM. 5. From the host run: # virsh -r dumpxml golden_env_mixed_virtio_0 | grep -a3 iotune <iotune> <read_iops_sec>100</read_iops_sec> <write_iops_sec>100</write_iops_sec> </iotune> For throughput profile: <iotune> <read_bytes_sec>104857600</read_bytes_sec> <write_bytes_sec>104857600</write_bytes_sec> </iotune> 6. Shutdown the VM. 7. Upgrade engine and host to 4.2: ovirt-engine-4.2.4.1-0.1.el7.noarch, vdsm-4.20.29-1.el7ev.x86_64 8. Start VM (done on cluster version: 4.1) 9. From the host run: # virsh -r dumpxml golden_env_mixed_virtio_0 | grep -a3 iotune <iotune> <read_iops_sec>100</read_iops_sec> <write_iops_sec>100</write_iops_sec> </iotune> # vdsm-client VM getIoTune vmID='2a66838a-6a2f-4f7d-9486-561ca4ddc6bf' [ { "ioTune": { "write_bytes_sec": 0, "total_iops_sec": 0, "read_iops_sec": 100, "read_bytes_sec": 0, "write_iops_sec": 100, "total_bytes_sec": 0 }, # virsh -r dumpxml golden_env_mixed_virtio_1 | grep -a3 iotune <iotune> <read_bytes_sec>104857600</read_bytes_sec> <write_bytes_sec>104857600</write_bytes_sec> </iotune> # vdsm-client VM getIoTune vmID='78488f39-064f-4440-8586-a91ba2fa0f55' [ { "ioTune": { "write_bytes_sec": 104857600, "total_iops_sec": 0, "read_iops_sec": 0, "read_bytes_sec": 104857600, "write_iops_sec": 0, "total_bytes_sec": 0 }, 10. Repeat steps 8-9 on cluster with Compatibility Version 4.2 11. From engine log domain XML(cluster version 4.2): iops: <iotune> <read_bytes_sec>0</read_bytes_sec> <read_iops_sec>100</read_iops_sec> <total_bytes_sec>0</total_bytes_sec> <total_iops_sec>0</total_iops_sec> <write_bytes_sec>0</write_bytes_sec> <write_iops_sec>100</write_iops_sec> </iotune> throughput: <iotune> <read_bytes_sec>104857600</read_bytes_sec> <read_iops_sec>0</read_iops_sec> <total_bytes_sec>0</total_bytes_sec> <total_iops_sec>0</total_iops_sec> <write_bytes_sec>104857600</write_bytes_sec> <write_iops_sec>0</write_iops_sec> </iotune> Results: The VM start with iotune, after upgrading the engine from 4.1 to 4.2 with cluster compatibility of 4.1 and 4.2. (Originally by Germano Veit Michel)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2072
BZ<2>Jira Resync
sync2jira