Bug 1857967
| Summary: | Broken migration with a host-passthrough CPU | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Jiri Denemark <jdenemar> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Luyao Huang <lhuang> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.3 | CC: | drjones, dyuan, fjin, jdenemar, lmen, virt-maint, xuzhang, yalzhang |
| Target Milestone: | rc | Keywords: | Regression, Triaged |
| Target Release: | 8.3 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-6.6.0-1.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-17 17:50:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-July/msg01236.html This bug is now fixed upstream by
commit c7afaa69cdd712d74d98e3cb37afd1b46aef7e42
Refs: v6.5.0-274-gc7afaa69cd
Author: Jiri Denemark <jdenemar>
AuthorDate: Wed Jul 15 22:33:07 2020 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Tue Jul 21 15:40:01 2020 +0200
qemu_monitor: Add API for checking CPU migratable property
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Daniel Henrique Barboza <danielhb413>
commit 4872ad27aae6b24a441e7bd59bd7ae234ef33b5b
Refs: v6.5.0-275-g4872ad27aa
Author: Jiri Denemark <jdenemar>
AuthorDate: Wed Jul 15 11:33:05 2020 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Tue Jul 21 15:40:01 2020 +0200
qemu: Do not set //cpu/@migratable for running domains in post-parse
Commit v6.4.0-61-g201bd5db63 started to fill the default value for
//cpu/@migratable attribute according to QEMU support. However, active
domains either have the migratable attribute already set or the
capabilities we use for checking the QEMU support were created by older
libvirt which didn't probe for this specific capability. Thus we should
leave active domains alone when parsing their XMLs.
https://bugzilla.redhat.com/show_bug.cgi?id=1857967
Reported-by: Mark Mielke <mark.mielke>
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Daniel Henrique Barboza <danielhb413>
commit 1031db36003c34d0291f3573f7d39cfae25e2cd7
Refs: v6.5.0-276-g1031db3600
Author: Jiri Denemark <jdenemar>
AuthorDate: Wed Jul 15 17:54:07 2020 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Tue Jul 21 15:40:01 2020 +0200
qemu: Properly set //cpu/@migratable default value for running domains
Since active domains which do not have the attribute already set were
not started by libvirt that probed for CPU migratable property, we need
to check this property on reconnect and update the domain definition
accordingly.
https://bugzilla.redhat.com/show_bug.cgi?id=1857967
Reported-by: Mark Mielke <mark.mielke>
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Daniel Henrique Barboza <danielhb413>
Verify this bug with libvirt-daemon-6.6.0-7.module+el8.3.0+8424+5ea525c5.x86_64:
1. prepare a host with old libvirt (<6.5.0):
# rpm -q libvirt-daemon
libvirt-daemon-6.0.0-17.3.module+el8.2.0+6907+6abdb1b6.x86_64
2. start a host-passthrough cpu mode guest:
# virsh dumpxml vm1
...
<cpu mode='host-passthrough' check='partial'>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
...
# virsh start vm1
Domain vm1 started
3. update host to latest 8.3 virt module:
4. check guest's active xml and inactive xml
# virsh dumpxml vm1
<cpu mode='host-passthrough' check='partial' migratable='on'>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
# virsh dumpxml vm1 --inactive
<cpu mode='host-passthrough' check='partial' migratable='on'>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
5. migrate guest to another host which have the same test environment:
# virsh migrate vm1 qemu+ssh://host1/system --live
6. check guest xml and qemu command line on target host:
# virsh dumpxml vm1
...
<cpu mode='host-passthrough' check='partial' migratable='on'>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
...
# virsh dumpxml vm1 --inactive
...
<cpu mode='host-passthrough' check='partial' migratable='on'>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
...
# ps aux|grep qemu
...-cpu host,migratable=on...
7. migrate back to source host:
# virsh migrate vm1 qemu+ssh://host0/system --live
8. check guest xml and qemu command line on source host:
# virsh dumpxml vm1
...
<cpu mode='host-passthrough' check='partial' migratable='on'>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
...
# virsh dumpxml vm1 --inactive
<cpu mode='host-passthrough' check='partial' migratable='on'>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
# ps uax|grep qemu
...-cpu host,migratable=on...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137 |
Description of problem: Second migration of a domain originally started by libvirt older than 6.5.0 may fail with an error similar to unable to execute QEMU command 'migrate': State blocked by non-migratable CPU device (invtsc flag) Version-Release number of selected component (if applicable): libvirt-6.5.0-1.el8 How reproducible: always Steps to Reproduce: 1. start libvirtd older than 6.5.0 2. start a domain with host-passthrough CPU 3. upgrade libvirtd to 6.5.0 4. migrate the domain to a host running libvirt 6.5.0 (or newer) 5. migrate the domain back to the original host Both hosts should have identical HW and SW, specifically microcode version, kernel version and its command line options and kvm{,_intel,amd} module options. Otherwise migration with host-passthrough CPU may be impossible. Actual results: The bug may be observed in any step starting with step 3: - after step 3 "virsh dumpxml" and "virsh dumpxml --inactive" show different values for the migratable attribute of the <cpu> element: virsh dumpxml: <cpu mode='host-passthrough' check='none' migratable='off'/> ... --inactive: <cpu mode='host-passthrough' check='none' migratable='on'/> - after step 4 the domain XML shows migratable='off' and the domain log or ps can show the QEMU process was started with -cpu host,migratable=off - the domain either fails to migrate in step 5 or it is again started with migratable='off' (depending on the host capabilities) Expected results: Both "virsh dumpxml" and "virsh dumpxml --inactive" should contain <cpu mode='host-passthrough' check='none' migratable='on'/> after step 3. In step 4 the domain should be started with -cpu host,migratable=on and the domain XML should be similar to the one in step 3, i.e., with migratable='on'. In step 5 the domain should be successfully migrated and started with migratable=on. Additional info: This regression is caused by the following upstream commit: commit 201bd5db639c063862b0c1b1abfab9a9a7c92591 Refs: v6.4.0-61-g201bd5db63 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Jun 2 15:34:07 2020 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Jun 9 20:32:50 2020 +0200 qemu: Fill default value in //cpu/@migratable attribute Before QEMU introduced migratable CPU property, "-cpu host" included all features that could be enabled on the host, even those which would block migration. In other words, the default was equivalent to migratable=off. When the migratable property was introduced, the default changed to migratable=on. Let's record the default in domain XML. Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Michal Privoznik <mprivozn>