Bug 994364

Summary: VIR_DOMAIN_XML_MIGRATABLE generates unmigratable XML
Product: [Community] Virtualization Tools Reporter: Tiziano Müller <tm>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: acathrow, chaochin, crobinso, dallan, hannsj_uhl, jdenemar, mprivozn, sross, tm
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1076503 (view as bug list) Environment:
Last Closed: 2013-10-11 15:43:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1076503, 1141838    
Attachments:
Description Flags
XML of Test-VM, generated with "virsh dumpxml" without flags (thus "default")
none
XML of Test-VM, generated with "virsh dumpxml" with --migratable flag
none
libvirtd.log obtained as described on the DebugLogs page
none
libvirtd.log obtained as described on the DebugLogs page none

Description Tiziano Müller 2013-08-07 06:42:20 UTC
Description of problem:

When using `virsh dumpxml e35422fa-18e7-41eb-8478-d09daff1b43a --migratable > dump-migratable.xml` to generate an XML to be used for migration, the 'pci-root' is missing compared to a plain `virsh dumpxml` or a `virsh dumpxml ... --security-info`:

       <alias name='virtio-serial0'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
     </controller>
-    <controller type='pci' index='0' model='pci-root'>
-      <alias name='pci0'/>
-    </controller>
     <interface type='bridge'>
       <mac address='52:54:00:8c:20:df'/>
       <source bridge='vmbr0'/>

This results in the following error when trying to use that XML for migration:

~ # virsh migrate --live --p2p --tunnelled --persistent --undefinesource --change-protection --verbose --compressed e35422fa-18e7-41eb-8478-d09daff1b43a 'qemu+tcp://10.1.130.14/system' --xml /tmp/new.xml 
error: unsupported configuration: Target controller type ide does not match source pci

Version-Release number of selected component (if applicable): 1.1.1

Comment 1 Tiziano Müller 2013-10-03 14:35:43 UTC
Small update: if the CPU-model is indirectly specified by using "<cpu mode='host-model' match='exact'>", the error changes to the following:

error: unsupported configuration: Target CPU model SandyBridge does not match source (null)

If the error message is correct, it seems that the source definition used to compare the XML to be used in the migration is not the same as generated by "dumpxml --migratable" (or the corresponding API call).

Comment 2 Tiziano Müller 2013-10-03 15:11:51 UTC
I was initially able to work around this bug by using --security-info instead of --migratable, but this workaround stopped working as soon as I added a second virtio-serial device for the qemu-guest-agent. In that case I got the following when using an XML generated using --security-info:

Oct  3 17:07:18 foss-cloud-node-01 libvirtd: 6242: error : virDomainDeviceInfoCheckABIStability:12718 : unsupported configuration: Target device address type none does not match source pci

Comment 3 Tiziano Müller 2013-10-03 15:14:14 UTC
This was also reproduced with version 1.1.2

Comment 4 Dave Allan 2013-10-03 18:41:51 UTC
Tiziano, is there a particular need you have that requires you to dump the XML and then specify it rather than just letting the migration pass the XML unchanged?

Comment 5 Dave Allan 2013-10-03 18:42:14 UTC
Michal, this isn't functionality I use, so I can't say how it's intended to work, but I think it should be added to the CI tests.

Comment 6 Tiziano Müller 2013-10-04 04:40:19 UTC
(In reply to Dave Allan from comment #4)
> Tiziano, is there a particular need you have that requires you to dump the
> XML and then specify it rather than just letting the migration pass the XML
> unchanged?

Yes, the spice port and the IP to bind spice to must be changed on migration. And it's our management interface which assigns the ports rather than libvirt auto-selecting it.

Comment 7 Michal Privoznik 2013-10-04 05:44:17 UTC
Tiziano,

can you please attach the XML of the domain you're trying to migrate? I mean without --migratable switch.

Dave,

right. I'll update our test suite once I'll figure out where the problem is.

Comment 8 Tiziano Müller 2013-10-04 07:11:14 UTC
Created attachment 807462 [details]
XML of Test-VM, generated with "virsh dumpxml" without flags (thus "default")

This is the corresponding error I get when migrating:

Oct  4 09:06:35 foss-cloud-node-02 libvirtd: 31425: error : virDomainDeviceInfoCheckABIStability:12718 : unsupported configuration: Target device address type none does not match source pci

Comment 9 Tiziano Müller 2013-10-04 07:12:31 UTC
Created attachment 807463 [details]
XML of Test-VM, generated with "virsh dumpxml" with --migratable flag

Attaching the migratable XML as well for comparison

Comment 10 Jiri Denemark 2013-10-04 07:18:25 UTC
Could you also turn on debug logs for destination libvirtd and try migrating without changing the XML? The thing is, the XML generated with virsh dumpxml --migratable is supposed to be exactly the same as the XML sent by libvirtd during migration. So by seeing that XML in the debug logs, we can check where the two XMLs differ.

Comment 11 Michal Privoznik 2013-10-04 07:25:47 UTC
just a small hint how to turn on and gather debug logs:

http://wiki.libvirt.org/page/DebugLogs

Comment 12 Jiri Denemark 2013-10-04 07:27:19 UTC
Oops, I guess I know what it is. While the --migratable XML is supposed to be the same as what we sent normally during migration, the xmlin definition is not checked against the migratable XML. It's checked against normal XML and thus the ABI check fails.

Comment 13 Tiziano Müller 2013-10-04 07:34:54 UTC
Created attachment 807477 [details]
libvirtd.log obtained as described on the DebugLogs page

Concerning the migration without providing an XML:

* I had to change the XML definition a bit: the IP for the spice server has to be changed to 0.0.0.0 for the migration to work, besides that, the definition as given before was used
* The command used is: virsh migrate --live --p2p --tunnelled --persistent --undefinesource --compressed 59a2135b-d134-4cf1-b188-3cd72bc503dd qemu+tcp://10.1.130.13/system
* and the migration works perfectly

Comment 14 Tiziano Müller 2013-10-04 07:38:02 UTC
(In reply to Jiri Denemark from comment #12)
> Oops, I guess I know what it is. While the --migratable XML is supposed to
> be the same as what we sent normally during migration, the xmlin definition
> is not checked against the migratable XML. It's checked against normal XML
> and thus the ABI check fails.

This also explains why the _source_ CPU model is (null), see my comment 1 for some VMs.

Comment 15 Tiziano Müller 2013-10-04 07:48:24 UTC
Created attachment 807479 [details]
libvirtd.log obtained as described on the DebugLogs page

sorry, I made a mistake when generating the first log: our code was still using the VIR_DOMAIN_XML_SECURE flag (which I used as a workaround).
Attaching the log for the migration with an XML generated using VIR_DOMAIN_XML_MIGRATABLE which shows the earlier failure (when checking the CPU type).

Just to make it clear what our cases are/were:

* Initially we had a VM config as attached but without virtio-rng, the second virtio-serial qemu-ga channel and without CPU host-model. Then migration failed with the error given in the initial report when using a XML generated using VIR_DOMAIN_XML_MIGRATABLE but worked with VIR_DOMAIN_XML_SECURE
* Then we added the virtio-rng, virtio-serial qemu-ga channel and CPU host-model definitions, giving the same error we saw earlier with VIR_DOMAIN_XML_MIGRATABLE now also with VIR_DOMAIN_XML_SECURE (which we used as a workaround)
* When doing a migration now using a XML generated using VIR_DOMAIN_XML_MIGRATABLE we get the error about the non-matching CPU model

Comment 16 Michal Privoznik 2013-10-08 09:44:09 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2013-October/msg00322.html

Comment 17 Michal Privoznik 2013-10-10 09:25:56 UTC
Even though the previous patch got ACKed, after some thinking it seems we can do better:

https://www.redhat.com/archives/libvir-list/2013-October/msg00477.html

Comment 18 Michal Privoznik 2013-10-11 08:49:31 UTC
So I've just pushed the patch upstream:

commit 7d704812b9c50cd3804dd1e7f9e2ea3e75fdc847
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Oct 10 10:53:56 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Fri Oct 11 10:31:35 2013 +0200

    qemu: Introduce qemuDomainDefCheckABIStability
    
    https://bugzilla.redhat.com/show_bug.cgi?id=994364
    
    Whenever we check for ABI stability, we have new xml (e.g. provided by
    user, or obtained from snapshot, whatever) which we compare to old xml
    and see if ABI won't break. However, if the new xml was produced via
    virDomainGetXMLDesc(..., VIR_DOMAIN_XML_MIGRATABLE) it lacks some
    devices, e.g. 'pci-root' controller. Hence, the ABI stability check
    fails even though it is stable. Moreover, we can't simply fix
    virDomainDefCheckABIStability because removing the correct devices is
    task for the driver. For instance, qemu driver wants to remove the usb
    controller too, while LXC driver doesn't. That's why we need special
    qemu wrapper over virDomainDefCheckABIStability which removes the
    correct devices from domain XML, produces MIGRATABLE xml and calls the
    check ABI stability function.
    
    Signed-off-by: Michal Privoznik <mprivozn>

Cole, do you think this is worth backporting onto maint branches and into Fedora? If not, then this bug can be moved to CLOSED NEXTRELEASE, right?

Comment 19 Cole Robinson 2013-10-11 15:43:50 UTC
If it's a trivial and safe backport to maint we might as well do it (git checkout v1.1.3-maint && git cherry-pick -x <commit> && git push origin v1.1.3-maint) but if not I say we wait for someone in Fedora to actually complain. Either way this bug is CLOSED->UPSTREAM since it's in the upstream bug tracker

Comment 20 Jiri Denemark 2014-03-13 18:58:57 UTC
*** Bug 1075174 has been marked as a duplicate of this bug. ***

Comment 21 Qin Zhao 2014-09-11 07:21:20 UTC
The bug may break OpenStack instance live migration on RHEL6.5/7.0. See https://bugs.launchpad.net/nova/+bug/1362929