Bug 1589612 - Cannot start VM with QoS IOPS after host&engine upgrade from 4.1 to 4.2
Summary: Cannot start VM with QoS IOPS after host&engine upgrade from 4.1 to 4.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.2.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: Michal Skrivanek
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On: 1579909
Blocks: 1589664
TreeView+ depends on / blocked
 
Reported: 2018-06-11 00:54 UTC by Germano Veit Michel
Modified: 2021-09-09 15:03 UTC (History)
11 users (show)

Fixed In Version: v4.30.3, ovirt-engine-4.3.0_rc
Doc Type: Bug Fix
Doc Text:
When a virtual machine starts, VDSM uses the domain metadata section to store data which is required to configure a virtual machine but which is not adequately represented by the standard libvirt domain. Previously, VDSM stored drive IO tune settings in this metadata that were redundant because they already had proper representation in the libvirt domain. Furthermore, if IO tune settings were enabled, a bug in storing the IO tune settings prevented the virtual machine from starting. The current release removes the redundant information from the domain metadata and fixes the bug that prevented virtual machines from starting.
Clone Of: 1579909
: 1589664 (view as bug list)
Environment:
Last Closed: 2019-05-08 12:36:02 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43527 0 None None None 2021-09-09 14:39:40 UTC
Red Hat Knowledge Base (Solution) 3482481 0 None None None 2018-06-11 01:03:05 UTC
Red Hat Product Errata RHBA-2019:1077 0 None None None 2019-05-08 12:36:24 UTC
oVirt gerrit 91427 0 None MERGED core: fix writing of iotune 2020-05-02 13:14:37 UTC
oVirt gerrit 91429 0 None MERGED virt: metadata: ignore iotune in drive specParams 2020-05-02 13:14:37 UTC
oVirt gerrit 91432 0 None MERGED virt: metadata: ignore iotune in drive specParams 2020-05-02 13:14:37 UTC
oVirt gerrit 91469 0 None ABANDONED virt: test: additional test for iotune 2020-05-02 13:14:37 UTC
oVirt gerrit 91492 0 None MERGED core: fix writing of iotune 2020-05-02 13:14:37 UTC

Description Germano Veit Michel 2018-06-11 00:54:24 UTC
+++ This bug was initially created as a clone of Bug #1579909 +++

Description of problem:
Ovirt environment: 7x hosts (4.1). After upgrading the engine and one of the hosts I am unable to start a VM on this new host. Starting the VM on one of the older 4.1 works fine. The error received in vdsm is as follows:

2018-05-17 14:24:45,561+0200 ERROR (vm/0d53dd5d) [virt.vm] (vmId='0d53dd5d-ef16-4763-bbdc-2dc173087bf5') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2882, in _run
    self._domDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2458, in _domDependentInit
    self._vmDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2495, in _vmDependentInit
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5158, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 509, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 721, in _build_xml
    md_elem = self._build_tree(namespace, namespace_uri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 711, in _build_tree
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 800, in _dump_device
    elems.append(_dump_device_spec_params(md_obj, value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 866, in _dump_device_spec_params
    spec_params_elem = md_obj.dump(_SPEC_PARAMS, **value)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 229, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 916, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported {u'write_bytes_sec': 0, u'total_iops_sec': 0, u'read_iops_sec': 100, u'read_bytes_sec': 0, u'write_iops_sec': 100, u'total_bytes_sec': 0} for ioTune

the logged VM kvm XML contains this:
        <disk device="disk" snapshot="no" type="block">
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"/>
            <source dev="/rhev/data-center/mnt/blockSD/252fe066-1a76-4c18-8ea3-29d3a07cdd4c/images/f617d674-e283-4e70-abb2-2ca2b8cb2bce/301f06aa-053c-48c8-8e25-f259d68b3395"/>
            <target bus="virtio" dev="vda"/>
            <serial>f617d674-e283-4e70-abb2-2ca2b8cb2bce</serial>
            <boot order="1"/>
            <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/>
            <iotune>
                <read_bytes_sec>0</read_bytes_sec>
                <read_iops_sec>100</read_iops_sec>
                <total_bytes_sec>0</total_bytes_sec>
                <total_iops_sec>0</total_iops_sec>
                <write_bytes_sec>0</write_bytes_sec>
                <write_iops_sec>100</write_iops_sec>
            </iotune>
        </disk>


Version-Release number of selected component (if applicable):
 - vdsm - Version     : 4.20.27.1
 - centos 7.5
 - qemu-kvm-ev Version     : 2.10.0 Release     : 21.el7_5.2.1

How reproducible:


Steps to Reproduce:
1. have 4.1 cluster with qos IOPS limits
2. upgrade engine and host to 4.2
3. try to run a VM

Actual results:
VM does not run

Expected results:
VM should start

Additional info:

--- Additional comment from Michal Skrivanek on 2018-05-19 02:03:57 EDT ---

From the original email thread:

Also, please make sure to report the steps you did to get this error.
Are you just starting a new VM using 4.2 Engine and 4.2 host? Or are you
migrating an old VM created with Engine 4.1?

Just for the sake of completeness (not sure it applies here), this flow
is NOT supported:

1. have a VM happily run on 4.1 host
2. upgrade Vdsm on that host from 4.1 to 4.2 while the VM is running
3. restart Vdsm

--- Additional comment from Michal Skrivanek on 2018-05-19 02:04:52 EDT ---

Please also attach engine.log

--- Additional comment from  on 2018-05-21 03:52 EDT ---



--- Additional comment from  on 2018-05-21 03:52 EDT ---



--- Additional comment from  on 2018-05-21 03:53 EDT ---



--- Additional comment from  on 2018-05-21 03:55:15 EDT ---

I just created a new VM on my 4.2 cluster. It's starting OK when the disk profile has no IO limits. Once I select another disk profile with limits, the VM does not start. I'm attaching the engine log from the failed start and also 2 XMLs from the log, one with a IO limit disk profile which does not start and the other without IO limits, which starts OK.

The diff is plain enough:
diff --git a/bug_bad.xml b/bug_ok.xml
index 40d2689..3aad896 100644
--- a/bug_bad.xml
+++ b/bug_ok.xml
@@ -98,7 +98,6 @@
             <alias name="ua-e94f8c97-bb7a-4dbb-a1a1-1470d70250aa"/>
             <address bus="0" controller="0" target="0" type="drive" unit="0"/>
             <serial>e94f8c97-bb7a-4dbb-a1a1-1470d70250aa</serial>
-            <iotune read_bytes_sec="0" read_iops_sec="300" total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0" write_iops_sec="300"/>
         </disk>
     </devices>
     <pm>


PS: on a 4.1 host both VMs start.

--- Additional comment from Francesco Romani on 2018-05-21 08:25:19 EDT ---

Vdsm bug, unwanted side effect of patch https://gerrit.ovirt.org/#/c/90435/

--- Additional comment from Francesco Romani on 2018-05-21 08:27:24 EDT ---

(In reply to ernest.beinrohr from comment #6)
> I just created a new VM on my 4.2 cluster. It's starting OK when the disk
> profile has no IO limits. Once I select another disk profile with limits,
> the VM does not start. I'm attaching the engine log from the failed start
> and also 2 XMLs from the log, one with a IO limit disk profile which does
> not start and the other without IO limits, which starts OK.
> 
> The diff is plain enough:
> diff --git a/bug_bad.xml b/bug_ok.xml
> index 40d2689..3aad896 100644
> --- a/bug_bad.xml
> +++ b/bug_ok.xml
> @@ -98,7 +98,6 @@
>              <alias name="ua-e94f8c97-bb7a-4dbb-a1a1-1470d70250aa"/>
>              <address bus="0" controller="0" target="0" type="drive"
> unit="0"/>
>              <serial>e94f8c97-bb7a-4dbb-a1a1-1470d70250aa</serial>
> -            <iotune read_bytes_sec="0" read_iops_sec="300"
> total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0"
> write_iops_sec="300"/>
>          </disk>
>      </devices>
>      <pm>
> 
> 
> PS: on a 4.1 host both VMs start.

Please note that the XML generated from Engine looks wrong. It should look like
https://libvirt.org/formatdomain.html#elementsDisks

Vdsm has code to deal with the format as per libvirt docs.
So, with the incoming patch Vdsm will let the Vm start, but until we also get an Engine fix, the IO tune settings will be silently discarded

--- Additional comment from Arik on 2018-05-21 08:42:37 EDT ---

(In reply to Francesco Romani from comment #8)
> Please note that the XML generated from Engine looks wrong. It should look
> like
> https://libvirt.org/formatdomain.html#elementsDisks

Ack, posted a fix.

--- Additional comment from  on 2018-05-21 08:46:43 EDT ---

A new machine createn on 4.2 engine has this iotune format:
            <iotune read_bytes_sec="0" read_iops_sec="300"
 total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0"
 write_iops_sec="300"/>
          </disk>

whereas the 4.1 engine generated (but also not working on 4.2 host) has this format:
            <iotune>
                <read_bytes_sec>0</read_bytes_sec>
                <read_iops_sec>100</read_iops_sec>
                <total_bytes_sec>0</total_bytes_sec>
                <total_iops_sec>0</total_iops_sec>
                <write_bytes_sec>0</write_bytes_sec>
                <write_iops_sec>100</write_iops_sec>
            </iotune>
        </disk>

Both formats work on a 4.1 host

--- Additional comment from Francesco Romani on 2018-05-21 09:47:48 EDT ---

(In reply to ernest.beinrohr from comment #10)
> A new machine createn on 4.2 engine has this iotune format:
>             <iotune read_bytes_sec="0" read_iops_sec="300"
>  total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0"
>  write_iops_sec="300"/>
>           </disk>
> 
> whereas the 4.1 engine generated (but also not working on 4.2 host) has this
> format:
>             <iotune>
>                 <read_bytes_sec>0</read_bytes_sec>
>                 <read_iops_sec>100</read_iops_sec>
>                 <total_bytes_sec>0</total_bytes_sec>
>                 <total_iops_sec>0</total_iops_sec>
>                 <write_bytes_sec>0</write_bytes_sec>
>                 <write_iops_sec>100</write_iops_sec>
>             </iotune>
>         </disk>
> 
> Both formats work on a 4.1 host

Hi,

Thanks for the additional information, it matches our findings.

I believe that patch https://gerrit.ovirt.org/#/c/91432/ should solve the issue, and, if so, will be included in the next ovirt 4.2 release.

--- Additional comment from Francesco Romani on 2018-05-21 10:57:35 EDT ---

Note about verification:

To properly fix this bug we need both Vdsm patch and Engine patch.

Vdsm patched, Engine unpatched
1. run a VM with iotune settings, it should start and work as usual
2. using "virsh -r dumpxml", or vdsm-client dumpxmls, inspect the domain XML, it should NOT have iotune settings configured.

Vdsm and Engine both patched:
1. run a VM with iotune settings, it should start and work as usual
2. using "virsh -r dumpxml", or vdsm-client dumpxmls, inspect the domain XML, it should have iotune settings configured.

--- Additional comment from RHV Bugzilla Automation and Verification Bot on 2018-05-24 19:53:42 EDT ---

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Open patch attached]

For more info please contact: infra

--- Additional comment from Sandro Bonazzola on 2018-05-25 03:31:41 EDT ---

Moving back to post as per comment #13

--- Additional comment from Francesco Romani on 2018-05-25 04:15:05 EDT ---

Unmerged patches are not needed to solve this issue.

We may keep this bug open for a different reason, however: without Engine patch, a VM with QoS settings will start, but ignoring the aforementioned QoS settings.

--- Additional comment from Francesco Romani on 2018-05-25 04:16:29 EDT ---

Just noticed that Engine patch is merged, so moving back to MODIFIED

https://gerrit.ovirt.org/#/c/91492/

--- Additional comment from Arik on 2018-05-27 04:51:02 EDT ---

(In reply to Francesco Romani from comment #16)
> Just noticed that Engine patch is merged, so moving back to MODIFIED
> 
> https://gerrit.ovirt.org/#/c/91492/

Right, and since all fixes were merged before 4.2.4 tagging moving to ON_QA.

--- Additional comment from Liran Rotenberg on 2018-06-06 07:11:34 EDT ---

Verified on:
ovirt-engine-4.2.4.1-0.1.el7.noarch
vdsm-4.20.29-1.el7ev.x86_64

Steps of verification:
1. Deploy RHEV 4.1: ovirt-engine-4.1.11.2-0.1.el7.noarch, vdsm-4.19.51-1.el7ev.x86_64
2. Create storage QoS profile in the datacenter.
3. Set IOPS read/write - 100 and 100.
Note: I created 2 QoS profiles one with IOPS and one with throughput(golden_env_mixed_virtio_0 is with IOPS, golden_env_mixed_virtio_1 is with throughput).
4. Create a VM, set the QoS profile on the disk VM and start VM.
5. From the host run: 
# virsh -r dumpxml golden_env_mixed_virtio_0 | grep -a3 iotune
<iotune>
    <read_iops_sec>100</read_iops_sec>
    <write_iops_sec>100</write_iops_sec>
</iotune>

For throughput profile:
<iotune>
    <read_bytes_sec>104857600</read_bytes_sec>
    <write_bytes_sec>104857600</write_bytes_sec>
</iotune>

6. Shutdown the VM.
7. Upgrade engine and host to 4.2: ovirt-engine-4.2.4.1-0.1.el7.noarch, vdsm-4.20.29-1.el7ev.x86_64
8. Start VM (done on cluster version: 4.1)
9. From the host run: 
# virsh -r dumpxml golden_env_mixed_virtio_0 | grep -a3 iotune
<iotune>
    <read_iops_sec>100</read_iops_sec>
    <write_iops_sec>100</write_iops_sec>
</iotune>

# vdsm-client VM getIoTune vmID='2a66838a-6a2f-4f7d-9486-561ca4ddc6bf'
[
    {
        "ioTune": {
            "write_bytes_sec": 0, 
            "total_iops_sec": 0, 
            "read_iops_sec": 100, 
            "read_bytes_sec": 0, 
            "write_iops_sec": 100, 
            "total_bytes_sec": 0
        }, 



# virsh -r dumpxml golden_env_mixed_virtio_1 | grep -a3 iotune

<iotune>
    <read_bytes_sec>104857600</read_bytes_sec>
    <write_bytes_sec>104857600</write_bytes_sec>
</iotune>

# vdsm-client VM getIoTune vmID='78488f39-064f-4440-8586-a91ba2fa0f55'
[
    {
        "ioTune": {
            "write_bytes_sec": 104857600, 
            "total_iops_sec": 0, 
            "read_iops_sec": 0, 
            "read_bytes_sec": 104857600, 
            "write_iops_sec": 0, 
            "total_bytes_sec": 0
        }, 
10. Repeat steps 8-9 on cluster with Compatibility Version 4.2 
11. From engine log domain XML(cluster version 4.2):
iops:
<iotune>
    <read_bytes_sec>0</read_bytes_sec>
    <read_iops_sec>100</read_iops_sec>
    <total_bytes_sec>0</total_bytes_sec>
    <total_iops_sec>0</total_iops_sec>
    <write_bytes_sec>0</write_bytes_sec>
    <write_iops_sec>100</write_iops_sec>
</iotune>

throughput:
<iotune>
    <read_bytes_sec>104857600</read_bytes_sec>
    <read_iops_sec>0</read_iops_sec>
    <total_bytes_sec>0</total_bytes_sec>
    <total_iops_sec>0</total_iops_sec>
    <write_bytes_sec>104857600</write_bytes_sec>
    <write_iops_sec>0</write_iops_sec>
</iotune>

Results:
The VM start with iotune, after upgrading the engine from 4.1 to 4.2 with cluster compatibility of 4.1 and 4.2.

Comment 3 RHV bug bot 2018-12-10 15:13:38 UTC
INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Project 'ovirt-engine'/Component 'vdsm' mismatch]

For more info please contact: rhv-devops

Comment 4 RHV bug bot 2019-01-15 23:36:10 UTC
INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Project 'ovirt-engine'/Component 'vdsm' mismatch]

For more info please contact: rhv-devops

Comment 5 Steve Goodman 2019-01-27 16:14:53 UTC
The doc_text describes a bug but doesn't state if the bug is fixed. Is it?

Comment 6 Michal Skrivanek 2019-01-28 08:10:12 UTC
yes, it is fixed. you can just rephrase or add a sentence that it's all fine now

Comment 7 Steve Goodman 2019-01-28 10:57:11 UTC
OK. I changed the second to final sentence to: "This bug appeared only if IO tune settings were enabled," since the bug was fixed.

Comment 9 errata-xmlrpc 2019-05-08 12:36:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1077

Comment 10 Daniel Gur 2019-08-28 13:13:19 UTC
sync2jira

Comment 11 Daniel Gur 2019-08-28 13:17:32 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.