Bug 1441662 - Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic'
Summary: Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu featur...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: yanqzhan@redhat.com
URL:
Whiteboard:
: 1444850 (view as bug list)
Depends On:
Blocks: 1449577 libvirtCPUconfig 1399515
TreeView+ depends on / blocked
 
Reported: 2017-04-12 12:22 UTC by yanqzhan@redhat.com
Modified: 2019-04-28 13:15 UTC (History)
21 users (show)

Fixed In Version: libvirt-3.2.0-14.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 00:05:54 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1846 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2017-08-01 18:02:50 UTC

Description yanqzhan@redhat.com 2017-04-12 12:22:52 UTC
Description of problem:
Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic'
libvirt in 7.4 automatically adds features which qemu enabled by itself, but libvirt on 7.3 is not able to see that qemu will add them

Version-Release number of selected component (if applicable):
Rhel7.4:
libvirt-3.2.0-2.el7.x86_64
qemu-kvm-rhev-2.8.0-6.el7.x86_64
Rhel7.3
libvirt-2.0.0-10.el7_3.6.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.prepare migration env
hostA: rhel7.3
hostB:rhel7.4
# virsh cpu-baseline base2hostB.xml
<cpu mode='custom' match='exact'>
  <model fallback='allow'>Penryn</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vme'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='dtes64'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='smx'/>
  <feature policy='require' name='est'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='xsave'/>
  <feature policy='require' name='osxsave'/>
</cpu>

SCENARIO1:
2.start a rhel7.3 domain on hostB with above cpu element
[root@hostB ~]# virsh start rhel7.3
Domain rhel7.3 started

[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Penryn</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='disable' name='ds'/>
    <feature policy='disable' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='tm'/>
    <feature policy='disable' name='pbe'/>
    <feature policy='disable' name='dtes64'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='ds_cpl'/>
    <feature policy='disable' name='vmx'/>
    <feature policy='disable' name='smx'/>
    <feature policy='disable' name='est'/>
    <feature policy='disable' name='tm2'/>
    <feature policy='disable' name='xtpr'/>
    <feature policy='disable' name='pdcm'/>
    <feature policy='require' name='xsave'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='require' name='x2apic'/>      <== automatically added when start
    <feature policy='require' name='hypervisor'/>   <== automatically added when start
  </cpu>

2. try to migrate to hostA
[root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe  --verbose
error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hypervisor

3.add <feature policy='disable' name='hypervisor'/> to its xml and restart domain
[root@hostB ~]# virsh destroy rhel7.3
Domain rhel7.3 destroyed

[root@hostB ~]# virsh edit rhel7.3
Domain rhel7.3 XML configuration edited.

[root@hostB ~]# virsh start rhel7.3
Domain rhel7.3 started

[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='full'>
...
    <feature policy='disable' name='hypervisor'/>
    <feature policy='require' name='x2apic'/>
  </cpu>

4. retry to migrate to hostA
[root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe  --verbose
Migration: [100 %]

5.on hostA, try to migrate back to hostB
[root@hostA ~]# virsh dumpxml V7.3-full|grep hypervisor
[root@hostA ~]# virsh dumpxml V7.3-full|grep x2apic
[root@hostA ~]# virsh migrate rhel7.3 --live qemu+ssh://hostB/system --unsafe  --verbose
error: the CPU is incompatible with host CPU: Host CPU does not provide required features: x2apic

SCENARIO2:
6.define and start the domain with same inactive xml on hostA, and try to migrate to hostB
[root@hostA ~]# virsh start rhel7.3
Domain rhel7.3 started

[root@hostA ~]# virsh dumpxml V7.3-full|grep hypervisor
[root@hostA ~]# virsh dumpxml V7.3-full|grep x2apic

[root@hostA ~]# virsh migrate rhel7.3 --live qemu+ssh://hostB/system --unsafe  --verbose
Migration: [100 %]

7. try to migrate back to host A from hostB
[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='full'>
...
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
…

[root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe  --verbose
error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hypervisor


SCENARIO3:
8.on hostB, manually add 'x2apic' and 'hypervisor' to domain xml, try to start the domain
[root@hostB ~]# virsh list --all|grep rhel7.3
 -     rhel7.3                        shut off

[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='partial'>
...
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
…

[root@hostB ~]# virsh start rhel7.3
error: Failed to start domain rhel7.3
error: the CPU is incompatible with host CPU: Host CPU does not provide required features: x2apic, hypervisor


Actual results:
As in step 3, 4, 7, cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic'.
libvirt in 7.4 automatically adds features which qemu enabled by itself, but libvirt on 7.3 is not able to see that qemu will add them

Expected results:
cross migration should succeed.

Additional info:

Comment 2 Jiri Denemark 2017-05-10 11:34:05 UTC
BTW, the same bug will prevent migration even between two 7.4 hosts with QEMU older than 2.9.0.

Comment 3 Jiri Denemark 2017-05-10 12:58:23 UTC
*** Bug 1444850 has been marked as a duplicate of this bug. ***

Comment 7 Vasiliy G Tolstov 2017-05-17 17:03:24 UTC
I have two nodes A and B with qemu 2.6.0
node A have libvirt 3.3.0
node B have libvirt 2.1.0

I'm try to migrate domain from A to B, xml contains:
<cpu mode='custom' match='exact'>
  <model fallback='allow'>kvm64</model>
</cpu>

As i see in dumpxml on node A:
cpu definition transforms to
<cpu mode='custom' match='exact' check='full'>
  <model fallback='forbid'>kvm64</model>
  <feature policy='require' name='hypervisor'/>
</cpu>
and migration failed.

what additional info needed to resolve this bug?

Comment 8 Jiri Denemark 2017-05-18 08:30:05 UTC
See https://www.redhat.com/archives/libvir-list/2017-May/msg00628.html for a discussion about the best way to fix this issue.

Comment 9 Jiri Denemark 2017-06-06 08:04:41 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00199.html

Comment 10 Jiri Denemark 2017-06-07 11:45:31 UTC
Fixed upstream in a series ending with

commit 8e34f478137c2a6b5e57e382729bd2776b042301
Refs: v3.4.0-58-g8e34f4781
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Wed May 31 12:34:10 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Wed Jun 7 13:36:02 2017 +0200

    qemu: Use updated CPU when starting QEMU if possible

    If QEMU is new enough and we have the live updated CPU definition in
    either save or migration cookie, we can use it to enforce ABI. The
    original guest CPU from domain XML will be stored in private data.

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
    Reviewed-by: Pavel Hrdina <phrdina@redhat.com>

Comment 13 yanqzhan@redhat.com 2017-06-16 09:26:13 UTC
Verify this bug with :
Rhel7.4:
*libvirt-3.2.0-10.el7.x86_64*
qemu-kvm-rhev-2.9.0-10.el7.x86_64
Rhel7.3
libvirt-2.0.0-10.el7_3.9.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
 
Steps:
Scenario1: cpu mode=custom  rhel7.4 -> rhel7.3

1.On rhel7.4 host, prepare a rhel7.3 guest with following xml:
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
 
2.Start the guest and check the xml again:
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
  </cpu>
 
3.Migrate to rhel7.3 host, and check the xml after migration:
#  virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
Login guest, the os works well.
 
4.Migrate back to rhel7.4 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
Same as Scenario1-step2.
Login guest, the os works well.

Comment 14 yanqzhan@redhat.com 2017-06-16 09:27:19 UTC
Scenario2: cpu mode=custom  rhel7.3 -> rhel7.4

1.On rhel7.3 host, prepare a rhel7.3 guest with following xml:
  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
 
2.Start the guest, check the xml again:
Same as Scenario2-step1
 
3.Migrate to rhel7.4 host, and check the xml after migration:
#  virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
  </cpu>
 
4.Migrate back to rhel7.3 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
Same as Scenario2-step1

Comment 15 yanqzhan@redhat.com 2017-06-16 09:28:48 UTC
Scenario3: cpu mode=host-model  rhel7.4 -> rhel7.3

1.On rhel7.4 host, prepare a rhel7.3 guest with following xml, disable some features that target host not supported:
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'>IvyBridge</model>
    <feature policy='disable' name='hypervisor'/>
    <feature policy='disable' name='tsc_adjust'/>
    <feature policy='disable' name='pdpe1gb'/>
  </cpu>
 
2.Start the guest,check the xml again:
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='pcid'/>
    <feature policy='disable' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='disable' name='tsc_adjust'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='disable' name='pdpe1gb'/>
  </cpu>
 
3.Migrate to rhel7.3 host, and check the xml after migrated:
#  virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
Same as Scenario3-step2.
Login guest, the os works well.
 
4.Migrate back to rhel7.4 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
Same as Scenario3-step2.
Login guest, the os works well.

Comment 16 yanqzhan@redhat.com 2017-06-16 09:29:50 UTC
Scenario4: cpu mode=host-model  rhel7.3-> rhel7.4

 1.On rhel7.3 host, prepare a rhel7.3 guest with following xml:
 <cpu mode='host-model'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
 
2.Start the guest,check the xml again:
Same as Scenario4-step1.
 
3. Migrate to rhel7.4 host, and check the xml after migrated:
#  virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
 <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='disable' name='ds'/>
    <feature policy='disable' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='tm'/>
    <feature policy='disable' name='pbe'/>
    <feature policy='disable' name='dtes64'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='ds_cpl'/>
    <feature policy='disable' name='vmx'/>
    <feature policy='disable' name='smx'/>
    <feature policy='disable' name='est'/>
    <feature policy='disable' name='tm2'/>
    <feature policy='disable' name='xtpr'/>
    <feature policy='disable' name='pdcm'/>
    <feature policy='require' name='pcid'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='hypervisor'/>
  </cpu>
Login guest, the os works well.
 
4.Migrate back to rhel7.3 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>IvyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='dtes64'/>
    <feature policy='require' name='monitor'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='smx'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='pcid'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
  </cpu>
Login guest, the os works well.

Comment 17 yanqzhan@redhat.com 2017-06-20 07:50:40 UTC
Mark as verified per comment 13, comment 14, comment 15, comment 16.

Comment 18 Jiri Denemark 2017-06-21 13:43:35 UTC
Oops, as revealed in https://bugzilla.redhat.com/show_bug.cgi?id=1181899#c21 the commit mentioned in comment #10 includes a tiny but nasty bug which causes libvirt to skip the CPU check if the CPUs in domain XML and migratable XML differ.

Comment 19 Jiri Denemark 2017-06-21 14:02:17 UTC
The additional patch was sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00874.html

Comment 20 Jiri Denemark 2017-06-21 14:22:49 UTC
The bug described in comment #18 is now fixed upstream by

commit eabb0002ca0bba3c5a94d16fb385783de7b144a5
Refs: v3.4.0-157-geabb0002c
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Wed Jun 21 15:31:38 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Wed Jun 21 16:20:53 2017 +0200

    qemu: Do not skip virCPUUpdateLive if priv->origCPU is set

    Even though we got both the original CPU (used for starting a domain)
    and the updated version (the CPU really provided by QEMU) during
    incoming migration, restore, or snapshot revert, we still need to update
    the CPU according to the data we got from the freshly started QEMU.
    Otherwise we don't know whether the CPU we got from QEMU matches the one
    before migration. We just need to keep the original CPU in
    priv->origCPU.

    Messed up by me in v3.4.0-58-g8e34f4781.

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
    Reviewed-by: Pavel Hrdina <phrdina@redhat.com>

Comment 22 yanqzhan@redhat.com 2017-06-23 03:41:08 UTC
Retest scenarios1~4 with:
libvirt-3.2.0-14.el7.x86_64
qemu-kvm-rhev-2.9.0-12.el7.x86_64

The results are same as comment13~16.

Mark this bug as verified.

Comment 23 errata-xmlrpc 2017-08-02 00:05:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 24 errata-xmlrpc 2017-08-02 01:30:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846


Note You need to log in before you can comment on or make changes to this bug.