Bug 2100629 - Update nested support KBASE article
Summary: Update nested support KBASE article
Keywords:
Status: CLOSED DUPLICATE of bug 2115371
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.11.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.11.1
Assignee: Itamar Holder
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-23 20:26 UTC by Kedar Bidarkar
Modified: 2022-10-13 15:27 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-13 15:07:46 UTC
Target Upstream Version:
Embargoed:
iholder: needinfo+


Attachments (Terms of Use)

Description Kedar Bidarkar 2022-06-23 20:26:04 UTC
Description of problem:
Summary: Migration with Nested VM fails without,
"invtsc cpu feature added to the VM's spec" in CNV 4.11, 

cpu:
  features:
  - name: vmx
    policy: require
  - name: invtsc
    policy: require

As per the bug comment, https://bugzilla.redhat.com/show_bug.cgi?id=2100054#c2

Issue encountered: While trying to migrate windows VM that has WSL2 guest fails.

Version-Release number of selected component (if applicable):
4.11

How reproducible:
Always

Steps to Reproduce:
1. Update nested support KBASE article, https://access.redhat.com/solutions/6692341
2.
3.

Actual results:
Current article, mentions "Virtual Machine Configuration" as,

cpu:
  features:
  - name: vmx
    policy: require

Expected results:
Update "Virtual Machine Configuration" section, as below

cpu:
  features:
  - name: vmx
    policy: require
  - name: invtsc
    policy: require

Additional info:
After doing so the Windows VM with WSL2 is able to migrate successfully.

Comment 1 Itamar Holder 2022-06-28 07:34:32 UTC
Made the changes.
It's still "Unpublished", I think someone needs to approve the changes.

Comment 2 Fabian Deutsch 2022-06-28 07:51:14 UTC
Please hold off publishing it.

We need to understand:
1. Why does this change occur?
2. Is the fix really complete by only setting the flag, why is the frquency not required?

Comment 3 Itamar Holder 2022-07-03 14:45:22 UTC
(In reply to Fabian Deutsch from comment #2)
> Please hold off publishing it.
> 
> We need to understand:
> 1. Why does this change occur?
> 2. Is the fix really complete by only setting the flag, why is the frquency
> not required?

1. It's because of QEMU's commit [1] that brakes backward compatibility in a way that forces HyperV Reenlightenment VMs to provide the --tsc-frequency parameter. This landed on QEMU 6.0.0. In CNV 4.9/4.10 we use QEMU 5.2.0. On CNV 4.11 we use QEMU 6.2.0.

2. Yes, it's fixed by setting the flag. Setting frequency is not required from the user, since in Kubevirt we have a mechanism for finding the minimum cluster frequency then passing it to QEMU via --tsc-frequency. Behind the scenes, every node sets this info in a label (through node-labeller) then we get the minimum frequency and add it to VMI's TopologyHints if needed. Then, when we're creating the virt-launcher pod, we add --tsc-frequency parameter with the value found in TopologyHints and pass it to QEMU.

[1] https://gitlab.com/qemu-project/qemu/-/commit/561dbb41b1d752098249128d8462aaadc56fd15d

This PR's description is now updated and includes all information, I recommend looking there for more info: https://github.com/kubevirt/kubevirt/pull/7986

-------

@fdeutsch I think it's best to update the K-base with these two points:
1. The bug would be fixed asap. When it's fixed, the user should not do anything in order for the fix to kick in.
2. Until the bug it fixed - recommend using the workaround (adding invtsc feature).

WDYT?

Comment 4 Fabian Deutsch 2022-07-04 10:29:41 UTC
WRT 1 - Really good that we understand the root cause.

WRT 2 - Thus with the same VM spec - if it failed before (due to the reenlihtment flag), in future it will automatically work?

The PR description looks better, I just have one comment for enhcamenet.

Now, to this KCS update:
Today we probably need an update to mention the workaround.
In future (once the bug is fixed), the workaround should be removed again.

I saw that in the prev change, the change was only done for intel CPU, but I suspect this affects both AMD and Intel? if so, then it sounds like we want the change for both vendors.

Comment 5 Itamar Holder 2022-07-04 14:16:58 UTC
(In reply to Fabian Deutsch from comment #4)
> WRT 1 - Really good that we understand the root cause.
> 
> WRT 2 - Thus with the same VM spec - if it failed before (due to the
> reenlihtment flag), in future it will automatically work?
> 
> The PR description looks better, I just have one comment for enhcamenet.
> 
> Now, to this KCS update:
> Today we probably need an update to mention the workaround.
> In future (once the bug is fixed), the workaround should be removed again.
> 
> I saw that in the prev change, the change was only done for intel CPU, but I
> suspect this affects both AMD and Intel? if so, then it sounds like we want
> the change for both vendors.

Regarding 2: Yes, this is correct.

Thanks for your suggestion on the PR, the description is updated.

I will update the KCS right now.

Comment 6 Itamar Holder 2022-07-04 14:44:02 UTC
The article is updated with the workaround & link to the bug's main bugzilla page.

Comment 7 sgott 2022-07-05 15:46:56 UTC
To Verify, inspect https://access.redhat.com/solutions/6692341 and ensure it matches the "expected results" from the description.

Comment 8 Itamar Holder 2022-07-06 10:55:18 UTC
Update:

@vsibirsk raised my attenuation to the fact that the workaround works only if applied to the VM after boot.
Therefore, I've updated the KBase again to include this note.

An investigation is still needed to understand why it doesn't work when applying before boot.

Comment 9 Kedar Bidarkar 2022-07-06 14:07:28 UTC
As, an investigation is still needed to understand why it doesn't work when applying before boot, 

moving this bug to ASSIGNED state.

Comment 10 Itamar Holder 2022-07-21 14:22:35 UTC
Clarification & update:

The fix:
This bug is being fixed in this PR: https://github.com/kubevirt/kubevirt/pull/7986.
Essentially, we have a mechanism to find the minimum TSC frequency on the cluster and provide to QEMU. The same mechanism is now being used for Windows VMs with Re-enlightenment enabled. You can read the PR description for more info.

This fix, however, will only land only in 4.11.1.

The workaround:
Unfortunately, the workaround does not work. This is because it turns out that KVM does not support Windows + invtsc.
Unfortunately a different working workaround is not found. I will edit the KBase article to include this information.

Comment 12 Antonio Cardace 2022-10-13 15:07:46 UTC
This bug has been fixed in 4.11.1 and is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=2115371 so closing this as a duplicate as there's no need to update the kbase article for 4.11.1.

*** This bug has been marked as a duplicate of bug 2115371 ***


Note You need to log in before you can comment on or make changes to this bug.