Bug 2151169 - Requested TSC frequency outside tolerance range & TSC scaling not supported
Summary: Requested TSC frequency outside tolerance range & TSC scaling not supported
Keywords:
Status: VERIFIED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.11.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.3
Assignee: ffossemo
QA Contact: Denys Shchedrivyi
URL:
Whiteboard:
Depends On: 2139896
Blocks: 2174229
TreeView+ depends on / blocked
 
Reported: 2022-12-06 09:15 UTC by Kedar Bidarkar
Modified: 2023-08-08 10:47 UTC (History)
8 users (show)

Fixed In Version: hco-bundle-registry-container-v4.11.3-2
Doc Type: Known Issue
Doc Text:
Cause: When using a Heterogeneous Cluster, with different node model. Consequence: The HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster. Workaround (if any): Result: On those nodes with the label, tsc-scalable=false - VM will never try to run on these nodes. (Note: This is for a VM with Reenlightenment flags in the VM spec only.)
Clone Of: 2139896
: 2174229 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 8996 0 None Merged [release-0.53]: Fix: Align Reenlightenment flows between converter.go and template.go 2023-01-13 10:16:22 UTC
Red Hat Issue Tracker CNV-23149 0 None None None 2022-12-06 09:23:15 UTC

Comment 2 Antonio Cardace 2023-01-12 14:34:40 UTC
@ffossemo will take care of the backport for 4.11.3 as the automatic cherry-pick failed.

Comment 3 ffossemo 2023-01-13 10:16:22 UTC
The original PR is already backported https://github.com/kubevirt/kubevirt/pull/8996

Comment 5 Denys Shchedrivyi 2023-01-24 15:47:18 UTC
I verified on CNV v4.11.3-8

 VM with reenlightenment flag is trying to run only on the nodes with appropriate tsc-frequency or on the nodes with tsc-scalable=true label.

 The only my concern - on heterogeneous cluster VM with reenlightenment flag may never run on specific nodes, even if I set nodeSelector explicitly.

For example, we have a cluster with these nodes:


>  name: node01
>    cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000'
>    cpu-timer.node.kubevirt.io/tsc-scalable: 'false'

>  name: node03
>    cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000'
>    cpu-timer.node.kubevirt.io/tsc-scalable: 'false'

>  name: node04
>    cpu-timer.node.kubevirt.io/tsc-frequency: '2095078000'
>    cpu-timer.node.kubevirt.io/tsc-scalable: 'true'


The virt-controller finds the lowest frequency and add it to VMs, in my case it is `tsc-frequency: '1699998000'`,  but since the node01 is tsc-scalable=false - VM will never try to run there.
When I set this node with node-selector - the POD stuck in Pending state with message:

>        0/10 nodes are available: 10 node(s) didn't match Pod's node
>        affinity/selector. preemption: 0/10 nodes are available: 10 Preemption
>        is not helpful for scheduling.

@iholder I suppose it is expected behavior: if tsc is not scalable on the node - skip this node
But what if I have a cluster where all 3 nodes non-scalable and with different tsc-freq, VM with reenlightenment (or with invtsc) will run only on one node with lowest frequency?

Comment 8 Denys Shchedrivyi 2023-02-06 17:43:51 UTC
Moving this BZ to Verified. As discussed - we should document that this is the known limitation of mixed clusters with non-scalable nodes.

Comment 9 Kedar Bidarkar 2023-02-09 13:40:15 UTC
When using a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.

Comment 10 Germano Veit Michel 2023-04-06 02:03:36 UTC
This BZ/Limitation can be hit easily when combined with BZ2184860.
Essentially, a homogeneous cluster with same CPUs can be seen as heterogeneous by this logic due to small TSC calibration variances on tsc-scale=false nodes.

The system should do better to attempt to start the VMs, confining VMs only to lowest frequency CPUs should not happen.
It may prevent users from starting important VMs if the VM cannot schedule on them for whatever reason.
Or not spread the workload properly for example.

It is more important to start the VM if the user requests it. If it cannot migrate later due to a different frequency that is a lower priority issue.

Upon fresh start, any frequency is fine. It should run the VM as number 1 priority if a user wants to start a VM.


Note You need to log in before you can comment on or make changes to this bug.