Bug 1441662
Summary: | Cross migration failed between rhel7.4 and rhel7.3 with qemu added cpu features: 'hypervisor' 'x2apic' | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yanqiu Zhang <yanqzhan> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | bmcclain, dyuan, fjin, jdenemar, jneedle, lhuang, lizhu, lleistne, lmiksik, mark, mburman, michal.skrivanek, mzhan, rbalakri, snagar, v.tolstov, xuzhang, yafu, yanqzhan, ycui, zpeng |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | libvirt-3.2.0-14.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 00:05:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1199452, 1399515, 1449577 |
Description
Yanqiu Zhang
2017-04-12 12:22:52 UTC
BTW, the same bug will prevent migration even between two 7.4 hosts with QEMU older than 2.9.0. *** Bug 1444850 has been marked as a duplicate of this bug. *** I have two nodes A and B with qemu 2.6.0 node A have libvirt 3.3.0 node B have libvirt 2.1.0 I'm try to migrate domain from A to B, xml contains: <cpu mode='custom' match='exact'> <model fallback='allow'>kvm64</model> </cpu> As i see in dumpxml on node A: cpu definition transforms to <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>kvm64</model> <feature policy='require' name='hypervisor'/> </cpu> and migration failed. what additional info needed to resolve this bug? See https://www.redhat.com/archives/libvir-list/2017-May/msg00628.html for a discussion about the best way to fix this issue. Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00199.html Fixed upstream in a series ending with commit 8e34f478137c2a6b5e57e382729bd2776b042301 Refs: v3.4.0-58-g8e34f4781 Author: Jiri Denemark <jdenemar> AuthorDate: Wed May 31 12:34:10 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Jun 7 13:36:02 2017 +0200 qemu: Use updated CPU when starting QEMU if possible If QEMU is new enough and we have the live updated CPU definition in either save or migration cookie, we can use it to enforce ABI. The original guest CPU from domain XML will be stored in private data. Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Pavel Hrdina <phrdina> Verify this bug with : Rhel7.4: *libvirt-3.2.0-10.el7.x86_64* qemu-kvm-rhev-2.9.0-10.el7.x86_64 Rhel7.3 libvirt-2.0.0-10.el7_3.9.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 Steps: Scenario1: cpu mode=custom rhel7.4 -> rhel7.3 1.On rhel7.4 host, prepare a rhel7.3 guest with following xml: <cpu mode='custom' match='exact' check='partial'> <model fallback='forbid'>IvyBridge</model> </cpu> 2.Start the guest and check the xml again: <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>IvyBridge</model> <feature policy='require' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='require' name='xsaveopt'/> </cpu> 3.Migrate to rhel7.3 host, and check the xml after migration: # virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V <cpu mode='custom' match='exact'> <model fallback='forbid'>IvyBridge</model> </cpu> Login guest, the os works well. 4.Migrate back to rhel7.4 host, check xml and guest os: # virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V Same as Scenario1-step2. Login guest, the os works well. Scenario2: cpu mode=custom rhel7.3 -> rhel7.4 1.On rhel7.3 host, prepare a rhel7.3 guest with following xml: <cpu mode='custom' match='exact'> <model fallback='forbid'>IvyBridge</model> </cpu> 2.Start the guest, check the xml again: Same as Scenario2-step1 3.Migrate to rhel7.4 host, and check the xml after migration: # virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>IvyBridge</model> <feature policy='require' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='require' name='xsaveopt'/> </cpu> 4.Migrate back to rhel7.3 host, check xml and guest os: # virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V Same as Scenario2-step1 Scenario3: cpu mode=host-model rhel7.4 -> rhel7.3 1.On rhel7.4 host, prepare a rhel7.3 guest with following xml, disable some features that target host not supported: <cpu mode='host-model' check='partial'> <model fallback='allow'>IvyBridge</model> <feature policy='disable' name='hypervisor'/> <feature policy='disable' name='tsc_adjust'/> <feature policy='disable' name='pdpe1gb'/> </cpu> 2.Start the guest,check the xml again: <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>IvyBridge</model> <vendor>Intel</vendor> <feature policy='require' name='ss'/> <feature policy='require' name='pcid'/> <feature policy='disable' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='disable' name='tsc_adjust'/> <feature policy='require' name='xsaveopt'/> <feature policy='disable' name='pdpe1gb'/> </cpu> 3.Migrate to rhel7.3 host, and check the xml after migrated: # virsh migrate V --live qemu+ssh://{tar_7.3}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V Same as Scenario3-step2. Login guest, the os works well. 4.Migrate back to rhel7.4 host, check xml and guest os: # virsh migrate V --live qemu+ssh://{src_7.4}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V Same as Scenario3-step2. Login guest, the os works well. Scenario4: cpu mode=host-model rhel7.3-> rhel7.4 1.On rhel7.3 host, prepare a rhel7.3 guest with following xml: <cpu mode='host-model'> <model fallback='forbid'>IvyBridge</model> </cpu> 2.Start the guest,check the xml again: Same as Scenario4-step1. 3. Migrate to rhel7.4 host, and check the xml after migrated: # virsh migrate V --live qemu+ssh://{tar_7.4}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>IvyBridge</model> <vendor>Intel</vendor> <feature policy='disable' name='ds'/> <feature policy='disable' name='acpi'/> <feature policy='require' name='ss'/> <feature policy='disable' name='ht'/> <feature policy='disable' name='tm'/> <feature policy='disable' name='pbe'/> <feature policy='disable' name='dtes64'/> <feature policy='disable' name='monitor'/> <feature policy='disable' name='ds_cpl'/> <feature policy='disable' name='vmx'/> <feature policy='disable' name='smx'/> <feature policy='disable' name='est'/> <feature policy='disable' name='tm2'/> <feature policy='disable' name='xtpr'/> <feature policy='disable' name='pdcm'/> <feature policy='require' name='pcid'/> <feature policy='disable' name='osxsave'/> <feature policy='require' name='arat'/> <feature policy='require' name='xsaveopt'/> <feature policy='require' name='hypervisor'/> </cpu> Login guest, the os works well. 4.Migrate back to rhel7.3 host, check xml and guest os: # virsh migrate V --live qemu+ssh://{src_7.3}/system --verbose --unsafe Migration: [100 %] # virsh dumpxml V <cpu mode='custom' match='exact'> <model fallback='forbid'>IvyBridge</model> <vendor>Intel</vendor> <feature policy='require' name='ds'/> <feature policy='require' name='acpi'/> <feature policy='require' name='ss'/> <feature policy='require' name='ht'/> <feature policy='require' name='tm'/> <feature policy='require' name='pbe'/> <feature policy='require' name='dtes64'/> <feature policy='require' name='monitor'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='vmx'/> <feature policy='require' name='smx'/> <feature policy='require' name='est'/> <feature policy='require' name='tm2'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='pdcm'/> <feature policy='require' name='pcid'/> <feature policy='require' name='osxsave'/> <feature policy='require' name='arat'/> <feature policy='require' name='xsaveopt'/> </cpu> Login guest, the os works well. Mark as verified per comment 13, comment 14, comment 15, comment 16. Oops, as revealed in https://bugzilla.redhat.com/show_bug.cgi?id=1181899#c21 the commit mentioned in comment #10 includes a tiny but nasty bug which causes libvirt to skip the CPU check if the CPUs in domain XML and migratable XML differ. The additional patch was sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-June/msg00874.html The bug described in comment #18 is now fixed upstream by commit eabb0002ca0bba3c5a94d16fb385783de7b144a5 Refs: v3.4.0-157-geabb0002c Author: Jiri Denemark <jdenemar> AuthorDate: Wed Jun 21 15:31:38 2017 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Jun 21 16:20:53 2017 +0200 qemu: Do not skip virCPUUpdateLive if priv->origCPU is set Even though we got both the original CPU (used for starting a domain) and the updated version (the CPU really provided by QEMU) during incoming migration, restore, or snapshot revert, we still need to update the CPU according to the data we got from the freshly started QEMU. Otherwise we don't know whether the CPU we got from QEMU matches the one before migration. We just need to keep the original CPU in priv->origCPU. Messed up by me in v3.4.0-58-g8e34f4781. Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Pavel Hrdina <phrdina> Retest scenarios1~4 with: libvirt-3.2.0-14.el7.x86_64 qemu-kvm-rhev-2.9.0-12.el7.x86_64 The results are same as comment13~16. Mark this bug as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |