Description of problem: Migration of windows VM that has wsl2 guest fails Version-Release number of selected component (if applicable): Openshift version: 4.11.0-fc.0 CNV version: 4.11.0 HCO image: brew.registry.redhat.io/rh-osbs/iib:251904 How reproducible: 100% Steps to Reproduce: 1.Create windows 10 vm with cpu: features: - name: vmx policy: require 2.Start VM (should start OK, wsl2 linux guest should also start OK) 3.Migrate VM Actual results: Migration fails Expected results: Migration succeeds Additional info: In source virt-handler logs 2 "types" of error messages were found for failed migration: 1) {"component":"virt-launcher","level":"error","msg":"internal error: qemu unexpectedly closed the monitor: 2022-06-21T14:03:47.472258Z qemu-kvm: Guest enabled re-enlightenment notifications, 'tsc-frequency=' has to be specified","pos":"virNetClientProgramDispatchError:172","subcomponent":"libvirt","thread":"30","timestamp":"2022-06-21T14:03:47.763000Z"} {"component":"virt-launcher","level":"info","msg":"2022-06-21T14:03:47.472291Z qemu-kvm: error while loading state for instance 0x0 of device 'cpu'","subcomponent":"libvirt","timestamp":"2022-06-21T14:03:47.763469Z"} {"component":"virt-launcher","kind":"","level":"error","msg":"Live migration failed.","name":"windows-wsl2-1655820075-1479514","namespace":"virt-general-test-wls2","pos":"live-migration-source.go:1052","reason":"error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2022-06-21T14:03:47.472258Z qemu-kvm: Guest enabled re-enlightenment notifications, 'tsc-frequency=' has to be specified\n2022-06-21T14:03:47.472291Z qemu-kvm: error while loading state for instance 0x0 of device 'cpu'\n2022-06-21T14:03:47.474064Z qemu-kvm: load of migration failed: Invalid argument')","timestamp":"2022-06-21T14:03:47.830105Z","uid":"407b7386-4285-40cd-8fb3-a37253f11f7f"} {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to migrate vmi","name":"windows-wsl2-1655820075-1479514","namespace":"virt-general-test-wls2","pos":"server.go:111","reason":"migration job 7d8cf65e-0ca3-4b53-a1c5-cfb912fb0f65 already executed, finished at 2022-06-21 14:03:47.725005494 +0000 UTC, completed: true, failed: true, abortStatus: ","timestamp":"2022-06-21T14:03:47.914858Z","uid":"407b7386-4285-40cd-8fb3-a37253f11f7f"} 2) {"component":"virt-launcher","level":"error","msg":"operation failed: migration out job: unexpectedly failed","pos":"qemuMigrationJobCheckStatus:1749","subcomponent":"libvirt","thread":"33","timestamp":"2022-06-21T19:40:49.306000Z"} {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to migrate vmi","name":"windows-wsl2-1655828106-1022391","namespace":"virt-general-test-wls2","pos":"server.go:111","reason":"migration job 89a5647e-04bb-4389-a0e2-975e3d570034 already executed, finished at 2022-06-21 19:40:49.646566613 +0000 UTC, completed: true, failed: true, abortStatus: ","timestamp":"2022-06-21T19:40:49.769054Z","uid":"099cb027-11f1-45d7-a99a-b5f51b724a33"} {"component":"virt-launcher","level":"error","msg":"internal error: qemu unexpectedly closed the monitor: 2022-06-21T19:40:49.239598Z qemu-kvm: Guest enabled re-enlightenment notifications, 'tsc-frequency=' has to be specified","pos":"virNetClientProgramDispatchError:172","subcomponent":"libvirt","thread":"33","timestamp":"2022-06-21T19:40:50.010000Z"} {"component":"virt-launcher","level":"info","msg":"2022-06-21T19:40:49.239642Z qemu-kvm: error while loading state for instance 0x0 of device 'cpu'","subcomponent":"libvirt","timestamp":"2022-06-21T19:40:50.011006Z"} {"component":"virt-launcher","level":"error","msg":"End of file while reading data: Input/output error","pos":"virNetSocketReadWire:1792","subcomponent":"libvirt","thread":"29","timestamp":"2022-06-21T19:40:50.011000Z"} {"component":"virt-launcher","level":"error","msg":"internal error: client socket is closed","pos":"virNetClientSendInternal:2159","subcomponent":"libvirt","thread":"33","timestamp":"2022-06-21T19:40:50.028000Z"} {"component":"virt-launcher","kind":"","level":"error","msg":"Live migration failed.","name":"windows-wsl2-1655828106-1022391","namespace":"virt-general-test-wls2","pos":"live-migration-source.go:1052","reason":"error encountered during MigrateToURI3 libvirt api call: virError(Code=1, Domain=7, Message='internal error: client socket is closed')","timestamp":"2022-06-21T19:40:50.039747Z","uid":"099cb027-11f1-45d7-a99a-b5f51b724a33"} Removing cpu: features: - name: vmx policy: require from VM spec fixes the migration issue, but wsl2 guest won't work
First update: I found this commit from QEMU: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg07116.html There, it says: "Require 'tsc-frequency=' command line option to be specified for successful migration when re-enlightenment was enabled by the guest." This means this behavior is intentional from QEMU's side which probably means it's a Kubevirt issue. Perhaps we don't provide that argument. I'm still looking for a workaround / solution. Will update as soon as I have more information.
Second update: ** WORKAROUND FOUND *** *Bug root cause:* In converter.go [1], we're adding a tsc frequency parameter if vmi.Status.TopologyHints != nil && vmi.Status.TopologyHints.TSCFrequency != nil. In virt-controller [2] we are checking if TopologyHints need to be added to the vmi by calling TopologyHintsRequiredForVMI(). There, we are checking if "invtsc" feature exists in vmi's spec (vmi.Spec.Domain.CPU.Features) with a "required" or "force" policy. Only then we add the TopologyHints to the vmi, which will lead (in [1]) to adding this parameter and pass it on to libvirt / QEMU. *Workaround* Simply add the invtsc feature to the VM's spec: cpu: features: - name: vmx policy: require - name: invtsc policy: require After doing so the VM is able to migrate successfully. [1] https://github.com/kubevirt/kubevirt/blob/v0.54.0/pkg/virt-launcher/virtwrap/converter/converter.go#L1691 [2] https://github.com/kubevirt/kubevirt/blob/v0.54.0/pkg/virt-controller/watch/vmi.go#L438
A few qeustions 1. What changed from 4.9 and 4.10 to 4.11 that this flag is now required but not before? 2. From the HAN Atempalte we know that the tsc frequency is also sometimes required. Why is the frequency not required here?
The HANA template: https://github.com/RHsyseng/cnv-supplemental-templates/blob/main/templates/saphana/rhel8.saphana.yaml
(In reply to Fabian Deutsch from comment #3) > A few qeustions > > 1. What changed from 4.9 and 4.10 to 4.11 that this flag is now required but > not before? > 2. From the HAN Atempalte we know that the tsc frequency is also sometimes > required. Why is the frequency not required here? Hey Fabian, 1. As written in comment #1 change came from a commit to QEMU done in March 2021: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg07116.html. 2. I see that this template has "invtsc" enabled without "vmx" or hyperV enabled. I'm not sure why it's needed for the template, but I think it's not related to the wsl2 case.
Hey Itamar. 1.1 Do we know in what RHEL-AV/RHEL version this has landed? 1.2 Why do you think that the qemu patch is relevant?
(In reply to Fabian Deutsch from comment #6) > Hey Itamar. > > 1.1 Do we know in what RHEL-AV/RHEL version this has landed? > 1.2 Why do you think that the qemu patch is relevant? 1. This landed on QEMU 6.0.0. In CNV 4.9/4.10 we use QEMU 5.2.0. On CNV 4.11 we use QEMU 6.2.0. 2. Since it brakes backward compatibility in a way that forces HyperV Reenlightenment VMs to provide the --tsc-frequency parameter. This PR's description is now updated and includes all information, I recommend looking there for more info: https://github.com/kubevirt/kubevirt/pull/7986
Re-setting the "Target Release" version to 4.11.1, as per the KCS article[1] update "of Nested Virt Support", which suggests so. [1] - https://access.redhat.com/solutions/6692341
As a clarifcation to Comment #12, there was some confusion in the verification procedure. While the VMs did come up according to KubeVirt, Windows did not boot. Windows cannot use the invtsc flag.
Clarification & update: The fix: This bug is being fixed in this PR: https://github.com/kubevirt/kubevirt/pull/7986. Essentially, we have a mechanism to find the minimum TSC frequency on the cluster and provide to QEMU. The same mechanism is now being used for Windows VMs with Re-enlightenment enabled. You can read the PR description for more info. This fix, however, will only land only in 4.11.1. The workaround: Unfortunately, the workaround does not work. This is because it turns out that KVM does not support Windows + invtsc. Unfortunately a different working workaround is not found. I will edit the KBase article to include this information.
Added known issue to the 4.11 release notes. @zpeng Please review: https://github.com/openshift/openshift-docs/pull/48328. Thank you.
checked related contents in PR, LGTM
Verified on: Openshift version: 4.11.0-rc.6 CNV version: 4.11.0 HCO image: brew.registry.redhat.io/rh-osbs/iib:286788 Windows VM with WSL2 guest starts and migrates successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6526