Bug 2100629

Summary: Update nested support KBASE article
Product: Container Native Virtualization (CNV) Reporter: Kedar Bidarkar <kbidarka>
Component: VirtualizationAssignee: Itamar Holder <iholder>
Status: CLOSED DUPLICATE QA Contact: Kedar Bidarkar <kbidarka>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.11.1CC: acardace, cnv-qe-bugs, dcadzow, fdeutsch, iholder, sgott, ycui
Target Milestone: ---Flags: iholder: needinfo+
Target Release: 4.11.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-13 15:07:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kedar Bidarkar 2022-06-23 20:26:04 UTC
Description of problem:
Summary: Migration with Nested VM fails without,
"invtsc cpu feature added to the VM's spec" in CNV 4.11, 

cpu:
  features:
  - name: vmx
    policy: require
  - name: invtsc
    policy: require

As per the bug comment, https://bugzilla.redhat.com/show_bug.cgi?id=2100054#c2

Issue encountered: While trying to migrate windows VM that has WSL2 guest fails.

Version-Release number of selected component (if applicable):
4.11

How reproducible:
Always

Steps to Reproduce:
1. Update nested support KBASE article, https://access.redhat.com/solutions/6692341
2.
3.

Actual results:
Current article, mentions "Virtual Machine Configuration" as,

cpu:
  features:
  - name: vmx
    policy: require

Expected results:
Update "Virtual Machine Configuration" section, as below

cpu:
  features:
  - name: vmx
    policy: require
  - name: invtsc
    policy: require

Additional info:
After doing so the Windows VM with WSL2 is able to migrate successfully.

Comment 1 Itamar Holder 2022-06-28 07:34:32 UTC
Made the changes.
It's still "Unpublished", I think someone needs to approve the changes.

Comment 2 Fabian Deutsch 2022-06-28 07:51:14 UTC
Please hold off publishing it.

We need to understand:
1. Why does this change occur?
2. Is the fix really complete by only setting the flag, why is the frquency not required?

Comment 3 Itamar Holder 2022-07-03 14:45:22 UTC
(In reply to Fabian Deutsch from comment #2)
> Please hold off publishing it.
> 
> We need to understand:
> 1. Why does this change occur?
> 2. Is the fix really complete by only setting the flag, why is the frquency
> not required?

1. It's because of QEMU's commit [1] that brakes backward compatibility in a way that forces HyperV Reenlightenment VMs to provide the --tsc-frequency parameter. This landed on QEMU 6.0.0. In CNV 4.9/4.10 we use QEMU 5.2.0. On CNV 4.11 we use QEMU 6.2.0.

2. Yes, it's fixed by setting the flag. Setting frequency is not required from the user, since in Kubevirt we have a mechanism for finding the minimum cluster frequency then passing it to QEMU via --tsc-frequency. Behind the scenes, every node sets this info in a label (through node-labeller) then we get the minimum frequency and add it to VMI's TopologyHints if needed. Then, when we're creating the virt-launcher pod, we add --tsc-frequency parameter with the value found in TopologyHints and pass it to QEMU.

[1] https://gitlab.com/qemu-project/qemu/-/commit/561dbb41b1d752098249128d8462aaadc56fd15d

This PR's description is now updated and includes all information, I recommend looking there for more info: https://github.com/kubevirt/kubevirt/pull/7986

-------

@fdeutsch I think it's best to update the K-base with these two points:
1. The bug would be fixed asap. When it's fixed, the user should not do anything in order for the fix to kick in.
2. Until the bug it fixed - recommend using the workaround (adding invtsc feature).

WDYT?

Comment 4 Fabian Deutsch 2022-07-04 10:29:41 UTC
WRT 1 - Really good that we understand the root cause.

WRT 2 - Thus with the same VM spec - if it failed before (due to the reenlihtment flag), in future it will automatically work?

The PR description looks better, I just have one comment for enhcamenet.

Now, to this KCS update:
Today we probably need an update to mention the workaround.
In future (once the bug is fixed), the workaround should be removed again.

I saw that in the prev change, the change was only done for intel CPU, but I suspect this affects both AMD and Intel? if so, then it sounds like we want the change for both vendors.

Comment 5 Itamar Holder 2022-07-04 14:16:58 UTC
(In reply to Fabian Deutsch from comment #4)
> WRT 1 - Really good that we understand the root cause.
> 
> WRT 2 - Thus with the same VM spec - if it failed before (due to the
> reenlihtment flag), in future it will automatically work?
> 
> The PR description looks better, I just have one comment for enhcamenet.
> 
> Now, to this KCS update:
> Today we probably need an update to mention the workaround.
> In future (once the bug is fixed), the workaround should be removed again.
> 
> I saw that in the prev change, the change was only done for intel CPU, but I
> suspect this affects both AMD and Intel? if so, then it sounds like we want
> the change for both vendors.

Regarding 2: Yes, this is correct.

Thanks for your suggestion on the PR, the description is updated.

I will update the KCS right now.

Comment 6 Itamar Holder 2022-07-04 14:44:02 UTC
The article is updated with the workaround & link to the bug's main bugzilla page.

Comment 7 sgott 2022-07-05 15:46:56 UTC
To Verify, inspect https://access.redhat.com/solutions/6692341 and ensure it matches the "expected results" from the description.

Comment 8 Itamar Holder 2022-07-06 10:55:18 UTC
Update:

@vsibirsk raised my attenuation to the fact that the workaround works only if applied to the VM after boot.
Therefore, I've updated the KBase again to include this note.

An investigation is still needed to understand why it doesn't work when applying before boot.

Comment 9 Kedar Bidarkar 2022-07-06 14:07:28 UTC
As, an investigation is still needed to understand why it doesn't work when applying before boot, 

moving this bug to ASSIGNED state.

Comment 10 Itamar Holder 2022-07-21 14:22:35 UTC
Clarification & update:

The fix:
This bug is being fixed in this PR: https://github.com/kubevirt/kubevirt/pull/7986.
Essentially, we have a mechanism to find the minimum TSC frequency on the cluster and provide to QEMU. The same mechanism is now being used for Windows VMs with Re-enlightenment enabled. You can read the PR description for more info.

This fix, however, will only land only in 4.11.1.

The workaround:
Unfortunately, the workaround does not work. This is because it turns out that KVM does not support Windows + invtsc.
Unfortunately a different working workaround is not found. I will edit the KBase article to include this information.

Comment 12 Antonio Cardace 2022-10-13 15:07:46 UTC
This bug has been fixed in 4.11.1 and is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=2115371 so closing this as a duplicate as there's no need to update the kbase article for 4.11.1.

*** This bug has been marked as a duplicate of bug 2115371 ***