Description of problem: Second migration of a domain originally started by libvirt older than 6.5.0 may fail with an error similar to unable to execute QEMU command 'migrate': State blocked by non-migratable CPU device (invtsc flag) Version-Release number of selected component (if applicable): libvirt-6.5.0-1.el8 How reproducible: always Steps to Reproduce: 1. start libvirtd older than 6.5.0 2. start a domain with host-passthrough CPU 3. upgrade libvirtd to 6.5.0 4. migrate the domain to a host running libvirt 6.5.0 (or newer) 5. migrate the domain back to the original host Both hosts should have identical HW and SW, specifically microcode version, kernel version and its command line options and kvm{,_intel,amd} module options. Otherwise migration with host-passthrough CPU may be impossible. Actual results: The bug may be observed in any step starting with step 3: - after step 3 "virsh dumpxml" and "virsh dumpxml --inactive" show different values for the migratable attribute of the <cpu> element: virsh dumpxml: <cpu mode='host-passthrough' check='none' migratable='off'/> ... --inactive: <cpu mode='host-passthrough' check='none' migratable='on'/> - after step 4 the domain XML shows migratable='off' and the domain log or ps can show the QEMU process was started with -cpu host,migratable=off - the domain either fails to migrate in step 5 or it is again started with migratable='off' (depending on the host capabilities) Expected results: Both "virsh dumpxml" and "virsh dumpxml --inactive" should contain <cpu mode='host-passthrough' check='none' migratable='on'/> after step 3. In step 4 the domain should be started with -cpu host,migratable=on and the domain XML should be similar to the one in step 3, i.e., with migratable='on'. In step 5 the domain should be successfully migrated and started with migratable=on. Additional info: This regression is caused by the following upstream commit: commit 201bd5db639c063862b0c1b1abfab9a9a7c92591 Refs: v6.4.0-61-g201bd5db63 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Jun 2 15:34:07 2020 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Jun 9 20:32:50 2020 +0200 qemu: Fill default value in //cpu/@migratable attribute Before QEMU introduced migratable CPU property, "-cpu host" included all features that could be enabled on the host, even those which would block migration. In other words, the default was equivalent to migratable=off. When the migratable property was introduced, the default changed to migratable=on. Let's record the default in domain XML. Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Michal Privoznik <mprivozn>
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-July/msg01236.html
This bug is now fixed upstream by commit c7afaa69cdd712d74d98e3cb37afd1b46aef7e42 Refs: v6.5.0-274-gc7afaa69cd Author: Jiri Denemark <jdenemar> AuthorDate: Wed Jul 15 22:33:07 2020 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Jul 21 15:40:01 2020 +0200 qemu_monitor: Add API for checking CPU migratable property Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Daniel Henrique Barboza <danielhb413> commit 4872ad27aae6b24a441e7bd59bd7ae234ef33b5b Refs: v6.5.0-275-g4872ad27aa Author: Jiri Denemark <jdenemar> AuthorDate: Wed Jul 15 11:33:05 2020 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Jul 21 15:40:01 2020 +0200 qemu: Do not set //cpu/@migratable for running domains in post-parse Commit v6.4.0-61-g201bd5db63 started to fill the default value for //cpu/@migratable attribute according to QEMU support. However, active domains either have the migratable attribute already set or the capabilities we use for checking the QEMU support were created by older libvirt which didn't probe for this specific capability. Thus we should leave active domains alone when parsing their XMLs. https://bugzilla.redhat.com/show_bug.cgi?id=1857967 Reported-by: Mark Mielke <mark.mielke> Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Daniel Henrique Barboza <danielhb413> commit 1031db36003c34d0291f3573f7d39cfae25e2cd7 Refs: v6.5.0-276-g1031db3600 Author: Jiri Denemark <jdenemar> AuthorDate: Wed Jul 15 17:54:07 2020 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Tue Jul 21 15:40:01 2020 +0200 qemu: Properly set //cpu/@migratable default value for running domains Since active domains which do not have the attribute already set were not started by libvirt that probed for CPU migratable property, we need to check this property on reconnect and update the domain definition accordingly. https://bugzilla.redhat.com/show_bug.cgi?id=1857967 Reported-by: Mark Mielke <mark.mielke> Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Daniel Henrique Barboza <danielhb413>
Verify this bug with libvirt-daemon-6.6.0-7.module+el8.3.0+8424+5ea525c5.x86_64: 1. prepare a host with old libvirt (<6.5.0): # rpm -q libvirt-daemon libvirt-daemon-6.0.0-17.3.module+el8.2.0+6907+6abdb1b6.x86_64 2. start a host-passthrough cpu mode guest: # virsh dumpxml vm1 ... <cpu mode='host-passthrough' check='partial'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> ... # virsh start vm1 Domain vm1 started 3. update host to latest 8.3 virt module: 4. check guest's active xml and inactive xml # virsh dumpxml vm1 <cpu mode='host-passthrough' check='partial' migratable='on'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> # virsh dumpxml vm1 --inactive <cpu mode='host-passthrough' check='partial' migratable='on'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> 5. migrate guest to another host which have the same test environment: # virsh migrate vm1 qemu+ssh://host1/system --live 6. check guest xml and qemu command line on target host: # virsh dumpxml vm1 ... <cpu mode='host-passthrough' check='partial' migratable='on'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> ... # virsh dumpxml vm1 --inactive ... <cpu mode='host-passthrough' check='partial' migratable='on'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> ... # ps aux|grep qemu ...-cpu host,migratable=on... 7. migrate back to source host: # virsh migrate vm1 qemu+ssh://host0/system --live 8. check guest xml and qemu command line on source host: # virsh dumpxml vm1 ... <cpu mode='host-passthrough' check='partial' migratable='on'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> ... # virsh dumpxml vm1 --inactive <cpu mode='host-passthrough' check='partial' migratable='on'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> # ps uax|grep qemu ...-cpu host,migratable=on...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137