Bug 1322842 - VM with a payload doesn't start with libvirt >= 1.3.2
Summary: VM with a payload doesn't start with libvirt >= 1.3.2
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.17.23
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ovirt-3.6.6
: 4.17.27
Assignee: Francesco Romani
QA Contact: sefi litmanovich
URL:
Whiteboard: Virt
: 1328383 (view as bug list)
Depends On:
Blocks: 1289202 1369086
TreeView+ depends on / blocked
 
Reported: 2016-03-31 12:30 UTC by Pavel Gashev
Modified: 2016-08-22 13:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-30 10:55:42 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-3.6.z+
mgoldboi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
libvirt debug log demonstrating migrations with empty disk serial (337.88 KB, application/octet-stream)
2016-04-18 11:29 UTC, Francesco Romani
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 56014 0 master MERGED virt: storage: avoid empty serials to domain XML 2016-04-15 20:35:22 UTC
oVirt gerrit 56040 0 None None None 2016-04-18 07:39:58 UTC
oVirt gerrit 56274 0 ovirt-3.6 MERGED virt: storage: avoid empty serials to domain XML 2016-04-18 11:33:08 UTC

Description Pavel Gashev 2016-03-31 12:30:45 UTC
Description of problem:
VM doesn't start with the following error in vdsm.log:
libvirtError: unsupported configuration: Disks 'hdc' and 'hdd' have identical serial

How reproducible:
Start a VM with enabled Initial Run.

Cause:
There are two cdrom devices with identical (empty) serials in domxml:

<disk device="cdrom" snapshot="no" type="file">
       <source file="/var/run/vdsm/payload/2eaf9c8e-2123-4b48-9b62-f96168ac7a36.41f22cb1676858ad4e22e8440519032d.img" startupPolicy="optional"/>
       <target bus="ide" dev="hdd"/>
       <readonly/>
       <serial/>
</disk>
<disk device="cdrom" snapshot="no" type="file">
       <source file="" startupPolicy="optional"/>
       <target bus="ide" dev="hdc"/>
       <readonly/>
       <serial/>
</disk>

The bug is related to https://bugzilla.redhat.com/show_bug.cgi?id=1245013

The following before_vm_start hook resolves the issue:

---------- cut here ----------
#!/usr/bin/python

import hooking
import uuid

domxml = hooking.read_domxml()

for disk in domxml.getElementsByTagName('disk'):
  if disk.getAttribute('device') == 'cdrom':
    for source in disk.getElementsByTagName('source'):
      if source.getAttribute('file').find('/payload/') > 0:
        for serial in disk.getElementsByTagName('serial'):
          if not serial.hasChildNodes():
            serial.appendChild(domxml.createTextNode(str(uuid.uuid4())))
            hooking.write_domxml(domxml)

---------- cut here ----------

Comment 1 Francesco Romani 2016-03-31 13:34:33 UTC
(In reply to Pavel Gashev from comment #0)
> Description of problem:
> VM doesn't start with the following error in vdsm.log:
> libvirtError: unsupported configuration: Disks 'hdc' and 'hdd' have
> identical serial
> 
> How reproducible:
> Start a VM with enabled Initial Run.
> 
> Cause:
> There are two cdrom devices with identical (empty) serials in domxml:
> 
> <disk device="cdrom" snapshot="no" type="file">
>        <source
> file="/var/run/vdsm/payload/2eaf9c8e-2123-4b48-9b62-f96168ac7a36.
> 41f22cb1676858ad4e22e8440519032d.img" startupPolicy="optional"/>
>        <target bus="ide" dev="hdd"/>
>        <readonly/>
>        <serial/>
> </disk>
> <disk device="cdrom" snapshot="no" type="file">
>        <source file="" startupPolicy="optional"/>
>        <target bus="ide" dev="hdc"/>
>        <readonly/>
>        <serial/>
> </disk>
> 
> The bug is related to https://bugzilla.redhat.com/show_bug.cgi?id=1245013

Indeed we send duplicate *empty* serial for both cdroms.
Peter, is this covered by rhbz#1245013 or is a different case?
In general, what does libvirt >= 1.3.2 do in the scenario above?

Comment 2 Francesco Romani 2016-04-11 10:12:15 UTC
If this is oVirt's fault, could be fixed in few ways:
- engine could just send fake imageId for the payload device, and everything should just work (tm)
- Vdsm could inject a fake imageId, maybe just a new UUID each time. Payload devices are throwaway devices, so this should be OK.

Comment 3 Peter Krempa 2016-04-11 11:19:59 UTC
I think the problem is in the approach libvirt uses to parse the <serial> element. If it's empty we apparently allocate an empty string rather than not storing it at all. Two disks with an empty string as serial are of course rejected for having the same serial rather than none.

I'll look at the code whether we can reasonably fix the parser to avoid this weird semantics.

Comment 4 Peter Krempa 2016-04-11 14:27:01 UTC
Okay, so while our code should not parse that as an empty string, using an empty <serial/> element does not conform to libvirt's XML schema as the serial is defined by the following regex "[A-Za-z0-9_\.\+\- ]+".

I'm afraid that by enforcing that your existing configs would stop to work, so I suggest fixing that on oVirts side. I'll look into tweaking the code though.

Comment 5 Francesco Romani 2016-04-12 10:22:39 UTC
(In reply to Peter Krempa from comment #4)
> Okay, so while our code should not parse that as an empty string, using an
> empty <serial/> element does not conform to libvirt's XML schema as the
> serial is defined by the following regex "[A-Za-z0-9_\.\+\- ]+".
> 
> I'm afraid that by enforcing that your existing configs would stop to work,
> so I suggest fixing that on oVirts side. I'll look into tweaking the code
> though.

That (and Dan's remark) make me realize that we should handle also the migrations from 1.2.x to 1.3.x, fixing our XML in the process.

Comment 6 Francesco Romani 2016-04-18 11:27:48 UTC
(In reply to Francesco Romani from comment #5)
> (In reply to Peter Krempa from comment #4)
> > Okay, so while our code should not parse that as an empty string, using an
> > empty <serial/> element does not conform to libvirt's XML schema as the
> > serial is defined by the following regex "[A-Za-z0-9_\.\+\- ]+".
> > 
> > I'm afraid that by enforcing that your existing configs would stop to work,
> > so I suggest fixing that on oVirts side. I'll look into tweaking the code
> > though.
> 
> That (and Dan's remark) make me realize that we should handle also the
> migrations from 1.2.x to 1.3.x, fixing our XML in the process.

While doing additional testing, it seems to me that libvirt 1.3.1 (rpm rebuilt from git sources) accepts inbound migration with multiple empty serial attributes (please check the attached libvirt logs).
Peter, is this intentional, and should we rely on that?

Comment 7 Francesco Romani 2016-04-18 11:29:33 UTC
Created attachment 1148177 [details]
libvirt debug log demonstrating migrations with empty disk serial

Comment 8 Peter Krempa 2016-04-18 11:38:38 UTC
(In reply to Francesco Romani from comment #6)
> While doing additional testing, it seems to me that libvirt 1.3.1 (rpm
> rebuilt from git sources) accepts inbound migration with multiple empty
> serial attributes (please check the attached libvirt logs).
> Peter, is this intentional, and should we rely on that?

Yes. The code rejecting identical <serial> elements for multiple disks is executed only on fresh starts so that existing VMs still can be migrated even if they are configured in a way that would prevent to start them normally.

Additionally, removing the disk serial at migration stage would make the migration fail since that would be a guest visible change and those are forbidden.

Comment 9 Francesco Romani 2016-04-18 11:52:59 UTC
(In reply to Peter Krempa from comment #8)
> (In reply to Francesco Romani from comment #6)
> > While doing additional testing, it seems to me that libvirt 1.3.1 (rpm
> > rebuilt from git sources) accepts inbound migration with multiple empty
> > serial attributes (please check the attached libvirt logs).
> > Peter, is this intentional, and should we rely on that?
> 
> Yes. The code rejecting identical <serial> elements for multiple disks is
> executed only on fresh starts so that existing VMs still can be migrated
> even if they are configured in a way that would prevent to start them
> normally.
> 
> Additionally, removing the disk serial at migration stage would make the
> migration fail since that would be a guest visible change and those are
> forbidden.

Thanks Peter, very good to know.
Also tested just in case hibernation/dehibernation, works as expected.

Moving to MODIFIED

Comment 10 Pavel Gashev 2016-04-19 09:29:09 UTC
I'm not sure it's already fixed here, but Direct LUNs have no serials as well. See https://bugzilla.redhat.com/1328383

Comment 11 Francesco Romani 2016-04-19 11:42:59 UTC
*** Bug 1328383 has been marked as a duplicate of this bug. ***

Comment 12 sefi litmanovich 2016-05-15 12:08:06 UTC
Verified using rhel 7.2 host with vdsm-4.17.28-0.el7ev.noarch running on top of libvirt-1.3.1-1.el7.x86_64:

[root@monique-vds01 x86_64]# rpm -qa | grep libvirt
libvirt-daemon-driver-secret-1.3.1-1.el7.x86_64
libvirt-daemon-kvm-1.3.1-1.el7.x86_64
libvirt-daemon-driver-interface-1.3.1-1.el7.x86_64
libvirt-docs-1.3.1-1.el7.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-1.3.1-1.el7.x86_64
libvirt-daemon-driver-nodedev-1.3.1-1.el7.x86_64
libvirt-daemon-driver-qemu-1.3.1-1.el7.x86_64
libvirt-devel-1.3.1-1.el7.x86_64
libvirt-lock-sanlock-1.3.1-1.el7.x86_64
libvirt-daemon-driver-network-1.3.1-1.el7.x86_64
libvirt-daemon-driver-storage-1.3.1-1.el7.x86_64
libvirt-daemon-config-nwfilter-1.3.1-1.el7.x86_64
libvirt-1.3.1-1.el7.x86_64
libvirt-login-shell-1.3.1-1.el7.x86_64
libvirt-daemon-driver-nwfilter-1.3.1-1.el7.x86_64
libvirt-daemon-config-network-1.3.1-1.el7.x86_64
libvirt-debuginfo-1.3.1-1.el7.x86_64
libvirt-client-1.3.1-1.el7.x86_64
libvirt-daemon-driver-lxc-1.3.1-1.el7.x86_64
libvirt-daemon-lxc-1.3.1-1.el7.x86_64

1. After installation of new libvirt and vdsm ran vdsm-tool configure, verified that virtlogd service is not running.
2. Installed host on 3.6.6 engine, started a new vm with cloud-init payload. 
serial tag was not included in disk attribute of domain's xml, both on cdrom device and the payload:

    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/run/vdsm/payload/5b1d76a5-d320-4825-8f98-47e0c1ab1586.c5a2f8bd8fe24053f78fa1b2458b6731.img' startupPolicy='optional'/>
      <backingStore/>
      <target dev='hdd' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-1'/>
      <address type='drive' controller='0' bus='1' target='0' unit='1'/>
    </disk>


Note You need to log in before you can comment on or make changes to this bug.