Description of problem: Summary: Migration with Nested VM fails without, "invtsc cpu feature added to the VM's spec" in CNV 4.11, cpu: features: - name: vmx policy: require - name: invtsc policy: require As per the bug comment, https://bugzilla.redhat.com/show_bug.cgi?id=2100054#c2 Issue encountered: While trying to migrate windows VM that has WSL2 guest fails. Version-Release number of selected component (if applicable): 4.11 How reproducible: Always Steps to Reproduce: 1. Update nested support KBASE article, https://access.redhat.com/solutions/6692341 2. 3. Actual results: Current article, mentions "Virtual Machine Configuration" as, cpu: features: - name: vmx policy: require Expected results: Update "Virtual Machine Configuration" section, as below cpu: features: - name: vmx policy: require - name: invtsc policy: require Additional info: After doing so the Windows VM with WSL2 is able to migrate successfully.
Made the changes. It's still "Unpublished", I think someone needs to approve the changes.
Please hold off publishing it. We need to understand: 1. Why does this change occur? 2. Is the fix really complete by only setting the flag, why is the frquency not required?
(In reply to Fabian Deutsch from comment #2) > Please hold off publishing it. > > We need to understand: > 1. Why does this change occur? > 2. Is the fix really complete by only setting the flag, why is the frquency > not required? 1. It's because of QEMU's commit [1] that brakes backward compatibility in a way that forces HyperV Reenlightenment VMs to provide the --tsc-frequency parameter. This landed on QEMU 6.0.0. In CNV 4.9/4.10 we use QEMU 5.2.0. On CNV 4.11 we use QEMU 6.2.0. 2. Yes, it's fixed by setting the flag. Setting frequency is not required from the user, since in Kubevirt we have a mechanism for finding the minimum cluster frequency then passing it to QEMU via --tsc-frequency. Behind the scenes, every node sets this info in a label (through node-labeller) then we get the minimum frequency and add it to VMI's TopologyHints if needed. Then, when we're creating the virt-launcher pod, we add --tsc-frequency parameter with the value found in TopologyHints and pass it to QEMU. [1] https://gitlab.com/qemu-project/qemu/-/commit/561dbb41b1d752098249128d8462aaadc56fd15d This PR's description is now updated and includes all information, I recommend looking there for more info: https://github.com/kubevirt/kubevirt/pull/7986 ------- @fdeutsch I think it's best to update the K-base with these two points: 1. The bug would be fixed asap. When it's fixed, the user should not do anything in order for the fix to kick in. 2. Until the bug it fixed - recommend using the workaround (adding invtsc feature). WDYT?
WRT 1 - Really good that we understand the root cause. WRT 2 - Thus with the same VM spec - if it failed before (due to the reenlihtment flag), in future it will automatically work? The PR description looks better, I just have one comment for enhcamenet. Now, to this KCS update: Today we probably need an update to mention the workaround. In future (once the bug is fixed), the workaround should be removed again. I saw that in the prev change, the change was only done for intel CPU, but I suspect this affects both AMD and Intel? if so, then it sounds like we want the change for both vendors.
(In reply to Fabian Deutsch from comment #4) > WRT 1 - Really good that we understand the root cause. > > WRT 2 - Thus with the same VM spec - if it failed before (due to the > reenlihtment flag), in future it will automatically work? > > The PR description looks better, I just have one comment for enhcamenet. > > Now, to this KCS update: > Today we probably need an update to mention the workaround. > In future (once the bug is fixed), the workaround should be removed again. > > I saw that in the prev change, the change was only done for intel CPU, but I > suspect this affects both AMD and Intel? if so, then it sounds like we want > the change for both vendors. Regarding 2: Yes, this is correct. Thanks for your suggestion on the PR, the description is updated. I will update the KCS right now.
The article is updated with the workaround & link to the bug's main bugzilla page.
To Verify, inspect https://access.redhat.com/solutions/6692341 and ensure it matches the "expected results" from the description.
Update: @vsibirsk raised my attenuation to the fact that the workaround works only if applied to the VM after boot. Therefore, I've updated the KBase again to include this note. An investigation is still needed to understand why it doesn't work when applying before boot.
As, an investigation is still needed to understand why it doesn't work when applying before boot, moving this bug to ASSIGNED state.
Clarification & update: The fix: This bug is being fixed in this PR: https://github.com/kubevirt/kubevirt/pull/7986. Essentially, we have a mechanism to find the minimum TSC frequency on the cluster and provide to QEMU. The same mechanism is now being used for Windows VMs with Re-enlightenment enabled. You can read the PR description for more info. This fix, however, will only land only in 4.11.1. The workaround: Unfortunately, the workaround does not work. This is because it turns out that KVM does not support Windows + invtsc. Unfortunately a different working workaround is not found. I will edit the KBase article to include this information.
This bug has been fixed in 4.11.1 and is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=2115371 so closing this as a duplicate as there's no need to update the kbase article for 4.11.1. *** This bug has been marked as a duplicate of bug 2115371 ***