Bug 1441662
| Summary: | Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic' | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Yanqiu Zhang <yanqzhan> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | bmcclain, dyuan, fjin, jdenemar, jneedle, lhuang, lizhu, lleistne, lmiksik, mark, mburman, michal.skrivanek, mzhan, rbalakri, snagar, v.tolstov, xuzhang, yafu, yanqzhan, ycui, zpeng |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-3.2.0-14.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-02 00:05:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1199452, 1399515, 1449577 | ||
BTW, the same bug will prevent migration even between two 7.4 hosts with QEMU older than 2.9.0. *** Bug 1444850 has been marked as a duplicate of this bug. *** I have two nodes A and B with qemu 2.6.0 node A have libvirt 3.3.0 node B have libvirt 2.1.0 I'm try to migrate domain from A to B, xml contains: <cpu mode='custom' match='exact'> <model fallback='allow'>kvm64</model> </cpu> As i see in dumpxml on node A: cpu definition transforms to <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>kvm64</model> <feature policy='require' name='hypervisor'/> </cpu> and migration failed. what additional info needed to resolve this bug? See https://www.redhat.com/archives/libvir-list/2017-May/msg00628.html for a discussion about the best way to fix this issue. Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00199.html Fixed upstream in a series ending with
commit 8e34f478137c2a6b5e57e382729bd2776b042301
Refs: v3.4.0-58-g8e34f4781
Author: Jiri Denemark <jdenemar>
AuthorDate: Wed May 31 12:34:10 2017 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Wed Jun 7 13:36:02 2017 +0200
qemu: Use updated CPU when starting QEMU if possible
If QEMU is new enough and we have the live updated CPU definition in
either save or migration cookie, we can use it to enforce ABI. The
original guest CPU from domain XML will be stored in private data.
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Pavel Hrdina <phrdina>
Verify this bug with :
Rhel7.4:
*libvirt-3.2.0-10.el7.x86_64*
qemu-kvm-rhev-2.9.0-10.el7.x86_64
Rhel7.3
libvirt-2.0.0-10.el7_3.9.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
Steps:
Scenario1: cpu mode=custom rhel7.4 -> rhel7.3
1.On rhel7.4 host, prepare a rhel7.3 guest with following xml:
<cpu mode='custom' match='exact' check='partial'>
<model fallback='forbid'>IvyBridge</model>
</cpu>
2.Start the guest and check the xml again:
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>IvyBridge</model>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='xsaveopt'/>
</cpu>
3.Migrate to rhel7.3 host, and check the xml after migration:
# virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
<cpu mode='custom' match='exact'>
<model fallback='forbid'>IvyBridge</model>
</cpu>
Login guest, the os works well.
4.Migrate back to rhel7.4 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
Same as Scenario1-step2.
Login guest, the os works well.
Scenario2: cpu mode=custom rhel7.3 -> rhel7.4
1.On rhel7.3 host, prepare a rhel7.3 guest with following xml:
<cpu mode='custom' match='exact'>
<model fallback='forbid'>IvyBridge</model>
</cpu>
2.Start the guest, check the xml again:
Same as Scenario2-step1
3.Migrate to rhel7.4 host, and check the xml after migration:
# virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>IvyBridge</model>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='xsaveopt'/>
</cpu>
4.Migrate back to rhel7.3 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
Same as Scenario2-step1
Scenario3: cpu mode=host-model rhel7.4 -> rhel7.3
1.On rhel7.4 host, prepare a rhel7.3 guest with following xml, disable some features that target host not supported:
<cpu mode='host-model' check='partial'>
<model fallback='allow'>IvyBridge</model>
<feature policy='disable' name='hypervisor'/>
<feature policy='disable' name='tsc_adjust'/>
<feature policy='disable' name='pdpe1gb'/>
</cpu>
2.Start the guest,check the xml again:
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='ss'/>
<feature policy='require' name='pcid'/>
<feature policy='disable' name='hypervisor'/>
<feature policy='require' name='arat'/>
<feature policy='disable' name='tsc_adjust'/>
<feature policy='require' name='xsaveopt'/>
<feature policy='disable' name='pdpe1gb'/>
</cpu>
3.Migrate to rhel7.3 host, and check the xml after migrated:
# virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
Same as Scenario3-step2.
Login guest, the os works well.
4.Migrate back to rhel7.4 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
Same as Scenario3-step2.
Login guest, the os works well.
Scenario4: cpu mode=host-model rhel7.3-> rhel7.4
1.On rhel7.3 host, prepare a rhel7.3 guest with following xml:
<cpu mode='host-model'>
<model fallback='forbid'>IvyBridge</model>
</cpu>
2.Start the guest,check the xml again:
Same as Scenario4-step1.
3. Migrate to rhel7.4 host, and check the xml after migrated:
# virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='disable' name='ds'/>
<feature policy='disable' name='acpi'/>
<feature policy='require' name='ss'/>
<feature policy='disable' name='ht'/>
<feature policy='disable' name='tm'/>
<feature policy='disable' name='pbe'/>
<feature policy='disable' name='dtes64'/>
<feature policy='disable' name='monitor'/>
<feature policy='disable' name='ds_cpl'/>
<feature policy='disable' name='vmx'/>
<feature policy='disable' name='smx'/>
<feature policy='disable' name='est'/>
<feature policy='disable' name='tm2'/>
<feature policy='disable' name='xtpr'/>
<feature policy='disable' name='pdcm'/>
<feature policy='require' name='pcid'/>
<feature policy='disable' name='osxsave'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='xsaveopt'/>
<feature policy='require' name='hypervisor'/>
</cpu>
Login guest, the os works well.
4.Migrate back to rhel7.3 host, check xml and guest os:
# virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe
Migration: [100 %]
# virsh dumpxml V
<cpu mode='custom' match='exact'>
<model fallback='forbid'>IvyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='ds'/>
<feature policy='require' name='acpi'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='ht'/>
<feature policy='require' name='tm'/>
<feature policy='require' name='pbe'/>
<feature policy='require' name='dtes64'/>
<feature policy='require' name='monitor'/>
<feature policy='require' name='ds_cpl'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='smx'/>
<feature policy='require' name='est'/>
<feature policy='require' name='tm2'/>
<feature policy='require' name='xtpr'/>
<feature policy='require' name='pdcm'/>
<feature policy='require' name='pcid'/>
<feature policy='require' name='osxsave'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='xsaveopt'/>
</cpu>
Login guest, the os works well.
Mark as verified per comment 13, comment 14, comment 15, comment 16. Oops, as revealed in https://bugzilla.redhat.com/show_bug.cgi?id=1181899#c21 the commit mentioned in comment #10 includes a tiny but nasty bug which causes libvirt to skip the CPU check if the CPUs in domain XML and migratable XML differ. The additional patch was sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00874.html The bug described in comment #18 is now fixed upstream by commit eabb0002ca0bba3c5a94d16fb385783de7b144a5 Refs: v3.4.0-157-geabb0002c Author: Jiri Denemark <jdenemar> AuthorDate: Wed Jun 21 15:31:38 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Jun 21 16:20:53 2017 +0200 qemu: Do not skip virCPUUpdateLive if priv->origCPU is set Even though we got both the original CPU (used for starting a domain) and the updated version (the CPU really provided by QEMU) during incoming migration, restore, or snapshot revert, we still need to update the CPU according to the data we got from the freshly started QEMU. Otherwise we don't know whether the CPU we got from QEMU matches the one before migration. We just need to keep the original CPU in priv->origCPU. Messed up by me in v3.4.0-58-g8e34f4781. Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Pavel Hrdina <phrdina> Retest scenarios1~4 with: libvirt-3.2.0-14.el7.x86_64 qemu-kvm-rhev-2.9.0-12.el7.x86_64 The results are same as comment13~16. Mark this bug as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |
Description of problem: Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic' libvirt in 7.4 automatically adds features which qemu enabled by itself, but libvirt on 7.3 is not able to see that qemu will add them Version-Release number of selected component (if applicable): Rhel7.4: libvirt-3.2.0-2.el7.x86_64 qemu-kvm-rhev-2.8.0-6.el7.x86_64 Rhel7.3 libvirt-2.0.0-10.el7_3.6.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 How reproducible: 100% Steps to Reproduce: 1.prepare migration env hostA: rhel7.3 hostB:rhel7.4 # virsh cpu-baseline base2hostB.xml <cpu mode='custom' match='exact'> <model fallback='allow'>Penryn</model> <vendor>Intel</vendor> <feature policy='require' name='vme'/> <feature policy='require' name='ds'/> <feature policy='require' name='acpi'/> <feature policy='require' name='ss'/> <feature policy='require' name='ht'/> <feature policy='require' name='tm'/> <feature policy='require' name='pbe'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='monitor'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='vmx'/> <feature policy='require' name='smx'/> <feature policy='require' name='est'/> <feature policy='require' name='tm2'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='xsave'/> <feature policy='require' name='osxsave'/> </cpu> SCENARIO1: 2.start a rhel7.3 domain on hostB with above cpu element [root@hostB ~]# virsh start rhel7.3 Domain rhel7.3 started [root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Penryn</model> <vendor>Intel</vendor> <feature policy='require' name='vme'/> <feature policy='disable' name='ds'/> <feature policy='disable' name='acpi'/> <feature policy='require' name='ss'/> <feature policy='disable' name='ht'/> <feature policy='disable' name='tm'/> <feature policy='disable' name='pbe'/> <feature policy='disable' name='dtes64'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='ds_cpl'/> <feature policy='disable' name='vmx'/> <feature policy='disable' name='smx'/> <feature policy='disable' name='est'/> <feature policy='disable' name='tm2'/> <feature policy='disable' name='xtpr'/> <feature policy='disable' name='pdcm'/> <feature policy='require' name='xsave'/> <feature policy='disable' name='osxsave'/> <feature policy='require' name='x2apic'/> <== automatically added when start <feature policy='require' name='hypervisor'/> <== automatically added when start </cpu> 2. try to migrate to hostA [root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe --verbose error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hypervisor 3.add <feature policy='disable' name='hypervisor'/> to its xml and restart domain [root@hostB ~]# virsh destroy rhel7.3 Domain rhel7.3 destroyed [root@hostB ~]# virsh edit rhel7.3 Domain rhel7.3 XML configuration edited. [root@hostB ~]# virsh start rhel7.3 Domain rhel7.3 started [root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25 <cpu mode='custom' match='exact' check='full'> ... <feature policy='disable' name='hypervisor'/> <feature policy='require' name='x2apic'/> </cpu> 4. retry to migrate to hostA [root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe --verbose Migration: [100 %] 5.on hostA, try to migrate back to hostB [root@hostA ~]# virsh dumpxml V7.3-full|grep hypervisor [root@hostA ~]# virsh dumpxml V7.3-full|grep x2apic [root@hostA ~]# virsh migrate rhel7.3 --live qemu+ssh://hostB/system --unsafe --verbose error: the CPU is incompatible with host CPU: Host CPU does not provide required features: x2apic SCENARIO2: 6.define and start the domain with same inactive xml on hostA, and try to migrate to hostB [root@hostA ~]# virsh start rhel7.3 Domain rhel7.3 started [root@hostA ~]# virsh dumpxml V7.3-full|grep hypervisor [root@hostA ~]# virsh dumpxml V7.3-full|grep x2apic [root@hostA ~]# virsh migrate rhel7.3 --live qemu+ssh://hostB/system --unsafe --verbose Migration: [100 %] 7. try to migrate back to host A from hostB [root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25 <cpu mode='custom' match='exact' check='full'> ... <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> … [root@hostB ~]# virsh migrate rhel7.3 --live qemu+ssh://hostA/system --unsafe --verbose error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hypervisor SCENARIO3: 8.on hostB, manually add 'x2apic' and 'hypervisor' to domain xml, try to start the domain [root@hostB ~]# virsh list --all|grep rhel7.3 - rhel7.3 shut off [root@hostB ~]# virsh dumpxml rhel7.3|grep "cpu mode" -A25 <cpu mode='custom' match='exact' check='partial'> ... <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> … [root@hostB ~]# virsh start rhel7.3 error: Failed to start domain rhel7.3 error: the CPU is incompatible with host CPU: Host CPU does not provide required features: x2apic, hypervisor Actual results: As in step 3, 4, 7, cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic'. libvirt in 7.4 automatically adds features which qemu enabled by itself, but libvirt on 7.3 is not able to see that qemu will add them Expected results: cross migration should succeed. Additional info: