Bug 1441662

Summary: Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic'
Product: Red Hat Enterprise Linux 7 Reporter: Yanqiu Zhang <yanqzhan>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Yanqiu Zhang <yanqzhan>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.4CC: bmcclain, dyuan, fjin, jdenemar, jneedle, lhuang, lizhu, lleistne, lmiksik, mark, mburman, michal.skrivanek, mzhan, rbalakri, snagar, v.tolstov, xuzhang, yafu, yanqzhan, ycui, zpeng
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-3.2.0-14.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 00:05:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1199452, 1399515, 1449577    

Description Yanqiu Zhang 2017-04-12 12:22:52 UTC
Description of problem:
Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic'
libvirt in 7.4 automatically adds features which qemu enabled by itself, but libvirt on 7.3 is not able to see that qemu will add them

Version-Release number of selected component (if applicable):
Rhel7.4:
libvirt-3.2.0-2.el7.x86_64
qemu-kvm-rhev-2.8.0-6.el7.x86_64
Rhel7.3
libvirt-2.0.0-10.el7_3.6.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.prepare migration env
hostA: rhel7.3
hostB:rhel7.4
# virsh cpu-baseline base2hostB.xml
<cpu mode='custom' match='exact'>
  <model fallback='allow'>Penryn</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='vme'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='dtes64'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='smx'/>
  <feature policy='require' name='est'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='xsave'/>
  <feature policy='require' name='osxsave'/>
</cpu>

SCENARIO1:
2.start a rhel7.3 domain on hostB with above cpu element
[root@hostB ~]# virsh start rhel7.3
Domain rhel7.3 started

[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Penryn</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='disable' name='ds'/>
    <feature policy='disable' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='tm'/>
    <feature policy='disable' name='pbe'/>
    <feature policy='disable' name='dtes64'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='ds_cpl'/>
    <feature policy='disable' name='vmx'/>
    <feature policy='disable' name='smx'/>
    <feature policy='disable' name='est'/>
    <feature policy='disable' name='tm2'/>
    <feature policy='disable' name='xtpr'/>
    <feature policy='disable' name='pdcm'/>
    <feature policy='require' name='xsave'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='require' name='x2apic'/>      <== automatically added when start
    <feature policy='require' name='hypervisor'/>   <== automatically added when start
  </cpu>

2. try to migrate to hostA
[root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe  --verbose
error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hypervisor

3.add <feature policy='disable' name='hypervisor'/> to its xml and restart domain
[root@hostB ~]# virsh destroy rhel7.3
Domain rhel7.3 destroyed

[root@hostB ~]# virsh edit rhel7.3
Domain rhel7.3 XML configuration edited.

[root@hostB ~]# virsh start rhel7.3
Domain rhel7.3 started

[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='full'>
...
    <feature policy='disable' name='hypervisor'/>
    <feature policy='require' name='x2apic'/>
  </cpu>

4. retry to migrate to hostA
[root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe  --verbose
Migration: [100 %]

5.on hostA, try to migrate back to hostB
[root@hostA ~]# virsh dumpxml V7.3-full|grep hypervisor
[root@hostA ~]# virsh dumpxml V7.3-full|grep x2apic
[root@hostA ~]# virsh migrate rhel7.3 --live qemu+ssh://hostB/system --unsafe  --verbose
error: the CPU is incompatible with host CPU: Host CPU does not provide required features: x2apic

SCENARIO2:
6.define and start the domain with same inactive xml on hostA, and try to migrate to hostB
[root@hostA ~]# virsh start rhel7.3
Domain rhel7.3 started

[root@hostA ~]# virsh dumpxml V7.3-full|grep hypervisor
[root@hostA ~]# virsh dumpxml V7.3-full|grep x2apic

[root@hostA ~]# virsh migrate rhel7.3 --live qemu+ssh://hostB/system --unsafe  --verbose
Migration: [100 %]

7. try to migrate back to host A from hostB
[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='full'>
...
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
…

[root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe  --verbose
error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hypervisor


SCENARIO3:
8.on hostB, manually add 'x2apic' and 'hypervisor' to domain xml, try to start the domain
[root@hostB ~]# virsh list --all|grep rhel7.3
 -     rhel7.3                        shut off

[root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25
  <cpu mode='custom' match='exact' check='partial'>
...
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
…

[root@hostB ~]# virsh start rhel7.3
error: Failed to start domain rhel7.3
error: the CPU is incompatible with host CPU: Host CPU does not provide required features: x2apic, hypervisor


Actual results:
As in step 3, 4, 7, cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic'.
libvirt in 7.4 automatically adds features which qemu enabled by itself, but libvirt on 7.3 is not able to see that qemu will add them

Expected results:
cross migration should succeed.

Additional info:

Comment 2 Jiri Denemark 2017-05-10 11:34:05 UTC
BTW, the same bug will prevent migration even between two 7.4 hosts with QEMU older than 2.9.0.

Comment 3 Jiri Denemark 2017-05-10 12:58:23 UTC
*** Bug 1444850 has been marked as a duplicate of this bug. ***

Comment 7 Vasiliy G Tolstov 2017-05-17 17:03:24 UTC
I have two nodes A and B with qemu 2.6.0
node A have libvirt 3.3.0
node B have libvirt 2.1.0

I'm try to migrate domain from A to B, xml contains:
<cpu mode='custom' match='exact'>
  <model fallback='allow'>kvm64</model>
</cpu>

As i see in dumpxml on node A:
cpu definition transforms to
<cpu mode='custom' match='exact' check='full'>
  <model fallback='forbid'>kvm64</model>
  <feature policy='require' name='hypervisor'/>
</cpu>
and migration failed.

what additional info needed to resolve this bug?

Comment 8 Jiri Denemark 2017-05-18 08:30:05 UTC
See https://www.redhat.com/archives/libvir-list/2017-May/msg00628.html for a discussion about the best way to fix this issue.

Comment 9 Jiri Denemark 2017-06-06 08:04:41 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00199.html

Comment 10 Jiri Denemark 2017-06-07 11:45:31 UTC
Fixed upstream in a series ending with

commit 8e34f478137c2a6b5e57e382729bd2776b042301
Refs: v3.4.0-58-g8e34f4781
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed May 31 12:34:10 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 7 13:36:02 2017 +0200

    qemu: Use updated CPU when starting QEMU if possible

    If QEMU is new enough and we have the live updated CPU definition in
    either save or migration cookie, we can use it to enforce ABI. The
    original guest CPU from domain XML will be stored in private data.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

Comment 13 Yanqiu Zhang 2017-06-16 09:26:13 UTC
Verify this bug with :
Rhel7.4:
*libvirt-3.2.0-10.el7.x86_64*
qemu-kvm-rhev-2.9.0-10.el7.x86_64
Rhel7.3
libvirt-2.0.0-10.el7_3.9.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
 
Steps:
Scenario1: cpu mode=custom  rhel7.4 -> rhel7.3

1.On rhel7.4 host, prepare a rhel7.3 guest with following xml:
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
 
2.Start the guest and check the xml again:
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
  </cpu>
 
3.Migrate to rhel7.3 host, and check the xml after migration:
#  virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
Login guest, the os works well.
 
4.Migrate back to rhel7.4 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
Same as Scenario1-step2.
Login guest, the os works well.

Comment 14 Yanqiu Zhang 2017-06-16 09:27:19 UTC
Scenario2: cpu mode=custom  rhel7.3 -> rhel7.4

1.On rhel7.3 host, prepare a rhel7.3 guest with following xml:
  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
 
2.Start the guest, check the xml again:
Same as Scenario2-step1
 
3.Migrate to rhel7.4 host, and check the xml after migration:
#  virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
  </cpu>
 
4.Migrate back to rhel7.3 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
Same as Scenario2-step1

Comment 15 Yanqiu Zhang 2017-06-16 09:28:48 UTC
Scenario3: cpu mode=host-model  rhel7.4 -> rhel7.3

1.On rhel7.4 host, prepare a rhel7.3 guest with following xml, disable some features that target host not supported:
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'>IvyBridge</model>
    <feature policy='disable' name='hypervisor'/>
    <feature policy='disable' name='tsc_adjust'/>
    <feature policy='disable' name='pdpe1gb'/>
  </cpu>
 
2.Start the guest,check the xml again:
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='pcid'/>
    <feature policy='disable' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='disable' name='tsc_adjust'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='disable' name='pdpe1gb'/>
  </cpu>
 
3.Migrate to rhel7.3 host, and check the xml after migrated:
#  virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
Same as Scenario3-step2.
Login guest, the os works well.
 
4.Migrate back to rhel7.4 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
Same as Scenario3-step2.
Login guest, the os works well.

Comment 16 Yanqiu Zhang 2017-06-16 09:29:50 UTC
Scenario4: cpu mode=host-model  rhel7.3-> rhel7.4

 1.On rhel7.3 host, prepare a rhel7.3 guest with following xml:
 <cpu mode='host-model'>
    <model fallback='forbid'>IvyBridge</model>
  </cpu>
 
2.Start the guest,check the xml again:
Same as Scenario4-step1.
 
3. Migrate to rhel7.4 host, and check the xml after migrated:
#  virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe
Migration: [100 %]
#  virsh dumpxml V
 <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='disable' name='ds'/>
    <feature policy='disable' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='tm'/>
    <feature policy='disable' name='pbe'/>
    <feature policy='disable' name='dtes64'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='ds_cpl'/>
    <feature policy='disable' name='vmx'/>
    <feature policy='disable' name='smx'/>
    <feature policy='disable' name='est'/>
    <feature policy='disable' name='tm2'/>
    <feature policy='disable' name='xtpr'/>
    <feature policy='disable' name='pdcm'/>
    <feature policy='require' name='pcid'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='hypervisor'/>
  </cpu>
Login guest, the os works well.
 
4.Migrate back to rhel7.3 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V 
  <cpu mode='custom' match='exact'>
    <model fallback='forbid'>IvyBridge</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ds'/>
    <feature policy='require' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='tm'/>
    <feature policy='require' name='pbe'/>
    <feature policy='require' name='dtes64'/>
    <feature policy='require' name='monitor'/>
    <feature policy='require' name='ds_cpl'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='smx'/>
    <feature policy='require' name='est'/>
    <feature policy='require' name='tm2'/>
    <feature policy='require' name='xtpr'/>
    <feature policy='require' name='pdcm'/>
    <feature policy='require' name='pcid'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
  </cpu>
Login guest, the os works well.

Comment 17 Yanqiu Zhang 2017-06-20 07:50:40 UTC
Mark as verified per comment 13, comment 14, comment 15, comment 16.

Comment 18 Jiri Denemark 2017-06-21 13:43:35 UTC
Oops, as revealed in https://bugzilla.redhat.com/show_bug.cgi?id=1181899#c21 the commit mentioned in comment #10 includes a tiny but nasty bug which causes libvirt to skip the CPU check if the CPUs in domain XML and migratable XML differ.

Comment 19 Jiri Denemark 2017-06-21 14:02:17 UTC
The additional patch was sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00874.html

Comment 20 Jiri Denemark 2017-06-21 14:22:49 UTC
The bug described in comment #18 is now fixed upstream by

commit eabb0002ca0bba3c5a94d16fb385783de7b144a5
Refs: v3.4.0-157-geabb0002c
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jun 21 15:31:38 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 21 16:20:53 2017 +0200

    qemu: Do not skip virCPUUpdateLive if priv->origCPU is set

    Even though we got both the original CPU (used for starting a domain)
    and the updated version (the CPU really provided by QEMU) during
    incoming migration, restore, or snapshot revert, we still need to update
    the CPU according to the data we got from the freshly started QEMU.
    Otherwise we don't know whether the CPU we got from QEMU matches the one
    before migration. We just need to keep the original CPU in
    priv->origCPU.

    Messed up by me in v3.4.0-58-g8e34f4781.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Pavel Hrdina <phrdina>

Comment 22 Yanqiu Zhang 2017-06-23 03:41:08 UTC
Retest scenarios1~4 with:
libvirt-3.2.0-14.el7.x86_64
qemu-kvm-rhev-2.9.0-12.el7.x86_64

The results are same as comment13~16.

Mark this bug as verified.

Comment 23 errata-xmlrpc 2017-08-02 00:05:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 24 errata-xmlrpc 2017-08-02 01:30:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846