Bug 1579909

Summary: Cannot start VM with QoS IOPS after host&engine upgrade from 4.1 to 4.2
Product: [oVirt] vdsm Reporter: ernest.beinrohr
Component: CoreAssignee: Francesco Romani <fromani>
Status: CLOSED CURRENTRELEASE QA Contact: Liran Rotenberg <lrotenbe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.20.23CC: ahadas, bugs, ernest.beinrohr, fromani, gveitmic, michal.skrivanek, msivak
Target Milestone: ovirt-4.2.4Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.20.28-1 Doc Type: If docs needed, set a value
Doc Text:
Vdsm uses the domain metadata section to store extra data which is required to configure a VM but not properly represented on the standard libvirt domain. This always happens when a VM starts. Vdsm tried to store the drive IO tune settings in the metadata, which was redundant because the IO tune has already a proper representation. Furthermore the implementation of the store operation of the IO tune settings had an implementation bug, which made it not possible to succesfully start the VM. This bug appears only if IO tune settings are enabled.
Story Points: ---
Clone Of:
: 1589612 (view as bug list) Environment:
Last Closed: 2018-06-26 08:38:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1589612    
Attachments:
Description Flags
engine log from a failed start
none
VM XML - no IO limits in this disks profile
none
VM XML - IO limits in this disks profile - not starting with this one on 4.2 host none

Description ernest.beinrohr 2018-05-18 16:04:14 UTC
Description of problem:
Ovirt environment: 7x hosts (4.1). After upgrading the engine and one of the hosts I am unable to start a VM on this new host. Starting the VM on one of the older 4.1 works fine. The error received in vdsm is as follows:

2018-05-17 14:24:45,561+0200 ERROR (vm/0d53dd5d) [virt.vm] (vmId='0d53dd5d-ef16-4763-bbdc-2dc173087bf5') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2882, in _run
    self._domDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2458, in _domDependentInit
    self._vmDependentInit()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2495, in _vmDependentInit
    self._sync_metadata()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5158, in _sync_metadata
    self._md_desc.dump(self._dom)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 509, in dump
    md_xml = self._build_xml()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 721, in _build_xml
    md_elem = self._build_tree(namespace, namespace_uri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 711, in _build_tree
    dev_elem = _dump_device(metadata_obj, data)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 800, in _dump_device
    elems.append(_dump_device_spec_params(md_obj, value))
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 866, in _dump_device_spec_params
    spec_params_elem = md_obj.dump(_SPEC_PARAMS, **value)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 229, in dump
    _keyvalue_to_elem(self._add_ns(key), value, elem)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/metadata.py", line 916, in _keyvalue_to_elem
    raise UnsupportedType(key, value)
UnsupportedType: Unsupported {u'write_bytes_sec': 0, u'total_iops_sec': 0, u'read_iops_sec': 100, u'read_bytes_sec': 0, u'write_iops_sec': 100, u'total_bytes_sec': 0} for ioTune

the logged VM kvm XML contains this:
        <disk device="disk" snapshot="no" type="block">
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x06" type="pci"/>
            <source dev="/rhev/data-center/mnt/blockSD/252fe066-1a76-4c18-8ea3-29d3a07cdd4c/images/f617d674-e283-4e70-abb2-2ca2b8cb2bce/301f06aa-053c-48c8-8e25-f259d68b3395"/>
            <target bus="virtio" dev="vda"/>
            <serial>f617d674-e283-4e70-abb2-2ca2b8cb2bce</serial>
            <boot order="1"/>
            <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw"/>
            <iotune>
                <read_bytes_sec>0</read_bytes_sec>
                <read_iops_sec>100</read_iops_sec>
                <total_bytes_sec>0</total_bytes_sec>
                <total_iops_sec>0</total_iops_sec>
                <write_bytes_sec>0</write_bytes_sec>
                <write_iops_sec>100</write_iops_sec>
            </iotune>
        </disk>


Version-Release number of selected component (if applicable):
 - vdsm - Version     : 4.20.27.1
 - centos 7.5
 - qemu-kvm-ev Version     : 2.10.0 Release     : 21.el7_5.2.1

How reproducible:


Steps to Reproduce:
1. have 4.1 cluster with qos IOPS limits
2. upgrade engine and host to 4.2
3. try to run a VM

Actual results:
VM does not run

Expected results:
VM should start

Additional info:

Comment 1 Michal Skrivanek 2018-05-19 06:03:57 UTC
From the original email thread:

Also, please make sure to report the steps you did to get this error.
Are you just starting a new VM using 4.2 Engine and 4.2 host? Or are you
migrating an old VM created with Engine 4.1?

Just for the sake of completeness (not sure it applies here), this flow
is NOT supported:

1. have a VM happily run on 4.1 host
2. upgrade Vdsm on that host from 4.1 to 4.2 while the VM is running
3. restart Vdsm

Comment 2 Michal Skrivanek 2018-05-19 06:04:52 UTC
Please also attach engine.log

Comment 3 ernest.beinrohr 2018-05-21 07:52:09 UTC
Created attachment 1439439 [details]
engine log from a failed start

Comment 4 ernest.beinrohr 2018-05-21 07:52:47 UTC
Created attachment 1439440 [details]
VM XML - no IO limits in this disks profile

Comment 5 ernest.beinrohr 2018-05-21 07:53:18 UTC
Created attachment 1439441 [details]
VM XML - IO limits in this disks profile - not starting with this one on 4.2 host

Comment 6 ernest.beinrohr 2018-05-21 07:55:15 UTC
I just created a new VM on my 4.2 cluster. It's starting OK when the disk profile has no IO limits. Once I select another disk profile with limits, the VM does not start. I'm attaching the engine log from the failed start and also 2 XMLs from the log, one with a IO limit disk profile which does not start and the other without IO limits, which starts OK.

The diff is plain enough:
diff --git a/bug_bad.xml b/bug_ok.xml
index 40d2689..3aad896 100644
--- a/bug_bad.xml
+++ b/bug_ok.xml
@@ -98,7 +98,6 @@
             <alias name="ua-e94f8c97-bb7a-4dbb-a1a1-1470d70250aa"/>
             <address bus="0" controller="0" target="0" type="drive" unit="0"/>
             <serial>e94f8c97-bb7a-4dbb-a1a1-1470d70250aa</serial>
-            <iotune read_bytes_sec="0" read_iops_sec="300" total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0" write_iops_sec="300"/>
         </disk>
     </devices>
     <pm>


PS: on a 4.1 host both VMs start.

Comment 7 Francesco Romani 2018-05-21 12:25:19 UTC
Vdsm bug, unwanted side effect of patch https://gerrit.ovirt.org/#/c/90435/

Comment 8 Francesco Romani 2018-05-21 12:27:24 UTC
(In reply to ernest.beinrohr from comment #6)
> I just created a new VM on my 4.2 cluster. It's starting OK when the disk
> profile has no IO limits. Once I select another disk profile with limits,
> the VM does not start. I'm attaching the engine log from the failed start
> and also 2 XMLs from the log, one with a IO limit disk profile which does
> not start and the other without IO limits, which starts OK.
> 
> The diff is plain enough:
> diff --git a/bug_bad.xml b/bug_ok.xml
> index 40d2689..3aad896 100644
> --- a/bug_bad.xml
> +++ b/bug_ok.xml
> @@ -98,7 +98,6 @@
>              <alias name="ua-e94f8c97-bb7a-4dbb-a1a1-1470d70250aa"/>
>              <address bus="0" controller="0" target="0" type="drive"
> unit="0"/>
>              <serial>e94f8c97-bb7a-4dbb-a1a1-1470d70250aa</serial>
> -            <iotune read_bytes_sec="0" read_iops_sec="300"
> total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0"
> write_iops_sec="300"/>
>          </disk>
>      </devices>
>      <pm>
> 
> 
> PS: on a 4.1 host both VMs start.

Please note that the XML generated from Engine looks wrong. It should look like
https://libvirt.org/formatdomain.html#elementsDisks

Vdsm has code to deal with the format as per libvirt docs.
So, with the incoming patch Vdsm will let the Vm start, but until we also get an Engine fix, the IO tune settings will be silently discarded

Comment 9 Arik 2018-05-21 12:42:37 UTC
(In reply to Francesco Romani from comment #8)
> Please note that the XML generated from Engine looks wrong. It should look
> like
> https://libvirt.org/formatdomain.html#elementsDisks

Ack, posted a fix.

Comment 10 ernest.beinrohr 2018-05-21 12:46:43 UTC
A new machine createn on 4.2 engine has this iotune format:
            <iotune read_bytes_sec="0" read_iops_sec="300"
 total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0"
 write_iops_sec="300"/>
          </disk>

whereas the 4.1 engine generated (but also not working on 4.2 host) has this format:
            <iotune>
                <read_bytes_sec>0</read_bytes_sec>
                <read_iops_sec>100</read_iops_sec>
                <total_bytes_sec>0</total_bytes_sec>
                <total_iops_sec>0</total_iops_sec>
                <write_bytes_sec>0</write_bytes_sec>
                <write_iops_sec>100</write_iops_sec>
            </iotune>
        </disk>

Both formats work on a 4.1 host

Comment 11 Francesco Romani 2018-05-21 13:47:48 UTC
(In reply to ernest.beinrohr from comment #10)
> A new machine createn on 4.2 engine has this iotune format:
>             <iotune read_bytes_sec="0" read_iops_sec="300"
>  total_bytes_sec="0" total_iops_sec="0" write_bytes_sec="0"
>  write_iops_sec="300"/>
>           </disk>
> 
> whereas the 4.1 engine generated (but also not working on 4.2 host) has this
> format:
>             <iotune>
>                 <read_bytes_sec>0</read_bytes_sec>
>                 <read_iops_sec>100</read_iops_sec>
>                 <total_bytes_sec>0</total_bytes_sec>
>                 <total_iops_sec>0</total_iops_sec>
>                 <write_bytes_sec>0</write_bytes_sec>
>                 <write_iops_sec>100</write_iops_sec>
>             </iotune>
>         </disk>
> 
> Both formats work on a 4.1 host

Hi,

Thanks for the additional information, it matches our findings.

I believe that patch https://gerrit.ovirt.org/#/c/91432/ should solve the issue, and, if so, will be included in the next ovirt 4.2 release.

Comment 12 Francesco Romani 2018-05-21 14:57:35 UTC
Note about verification:

To properly fix this bug we need both Vdsm patch and Engine patch.

Vdsm patched, Engine unpatched
1. run a VM with iotune settings, it should start and work as usual
2. using "virsh -r dumpxml", or vdsm-client dumpxmls, inspect the domain XML, it should NOT have iotune settings configured.

Vdsm and Engine both patched:
1. run a VM with iotune settings, it should start and work as usual
2. using "virsh -r dumpxml", or vdsm-client dumpxmls, inspect the domain XML, it should have iotune settings configured.

Comment 13 RHV bug bot 2018-05-24 23:53:42 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Open patch attached]

For more info please contact: infra

Comment 14 Sandro Bonazzola 2018-05-25 07:31:41 UTC
Moving back to post as per comment #13

Comment 15 Francesco Romani 2018-05-25 08:15:05 UTC
Unmerged patches are not needed to solve this issue.

We may keep this bug open for a different reason, however: without Engine patch, a VM with QoS settings will start, but ignoring the aforementioned QoS settings.

Comment 16 Francesco Romani 2018-05-25 08:16:29 UTC
Just noticed that Engine patch is merged, so moving back to MODIFIED

https://gerrit.ovirt.org/#/c/91492/

Comment 17 Arik 2018-05-27 08:51:02 UTC
(In reply to Francesco Romani from comment #16)
> Just noticed that Engine patch is merged, so moving back to MODIFIED
> 
> https://gerrit.ovirt.org/#/c/91492/

Right, and since all fixes were merged before 4.2.4 tagging moving to ON_QA.

Comment 18 Liran Rotenberg 2018-06-06 11:11:34 UTC
Verified on:
ovirt-engine-4.2.4.1-0.1.el7.noarch
vdsm-4.20.29-1.el7ev.x86_64

Steps of verification:
1. Deploy RHEV 4.1: ovirt-engine-4.1.11.2-0.1.el7.noarch, vdsm-4.19.51-1.el7ev.x86_64
2. Create storage QoS profile in the datacenter.
3. Set IOPS read/write - 100 and 100.
Note: I created 2 QoS profiles one with IOPS and one with throughput(golden_env_mixed_virtio_0 is with IOPS, golden_env_mixed_virtio_1 is with throughput).
4. Create a VM, set the QoS profile on the disk VM and start VM.
5. From the host run: 
# virsh -r dumpxml golden_env_mixed_virtio_0 | grep -a3 iotune
<iotune>
    <read_iops_sec>100</read_iops_sec>
    <write_iops_sec>100</write_iops_sec>
</iotune>

For throughput profile:
<iotune>
    <read_bytes_sec>104857600</read_bytes_sec>
    <write_bytes_sec>104857600</write_bytes_sec>
</iotune>

6. Shutdown the VM.
7. Upgrade engine and host to 4.2: ovirt-engine-4.2.4.1-0.1.el7.noarch, vdsm-4.20.29-1.el7ev.x86_64
8. Start VM (done on cluster version: 4.1)
9. From the host run: 
# virsh -r dumpxml golden_env_mixed_virtio_0 | grep -a3 iotune
<iotune>
    <read_iops_sec>100</read_iops_sec>
    <write_iops_sec>100</write_iops_sec>
</iotune>

# vdsm-client VM getIoTune vmID='2a66838a-6a2f-4f7d-9486-561ca4ddc6bf'
[
    {
        "ioTune": {
            "write_bytes_sec": 0, 
            "total_iops_sec": 0, 
            "read_iops_sec": 100, 
            "read_bytes_sec": 0, 
            "write_iops_sec": 100, 
            "total_bytes_sec": 0
        }, 



# virsh -r dumpxml golden_env_mixed_virtio_1 | grep -a3 iotune

<iotune>
    <read_bytes_sec>104857600</read_bytes_sec>
    <write_bytes_sec>104857600</write_bytes_sec>
</iotune>

# vdsm-client VM getIoTune vmID='78488f39-064f-4440-8586-a91ba2fa0f55'
[
    {
        "ioTune": {
            "write_bytes_sec": 104857600, 
            "total_iops_sec": 0, 
            "read_iops_sec": 0, 
            "read_bytes_sec": 104857600, 
            "write_iops_sec": 0, 
            "total_bytes_sec": 0
        }, 
10. Repeat steps 8-9 on cluster with Compatibility Version 4.2 
11. From engine log domain XML(cluster version 4.2):
iops:
<iotune>
    <read_bytes_sec>0</read_bytes_sec>
    <read_iops_sec>100</read_iops_sec>
    <total_bytes_sec>0</total_bytes_sec>
    <total_iops_sec>0</total_iops_sec>
    <write_bytes_sec>0</write_bytes_sec>
    <write_iops_sec>100</write_iops_sec>
</iotune>

throughput:
<iotune>
    <read_bytes_sec>104857600</read_bytes_sec>
    <read_iops_sec>0</read_iops_sec>
    <total_bytes_sec>0</total_bytes_sec>
    <total_iops_sec>0</total_iops_sec>
    <write_bytes_sec>104857600</write_bytes_sec>
    <write_iops_sec>0</write_iops_sec>
</iotune>

Results:
The VM start with iotune, after upgrading the engine from 4.1 to 4.2 with cluster compatibility of 4.1 and 4.2.

Comment 19 Sandro Bonazzola 2018-06-26 08:38:48 UTC
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.