@ffossemo will take care of the backport for 4.11.3 as the automatic cherry-pick failed.
The original PR is already backported https://github.com/kubevirt/kubevirt/pull/8996
I verified on CNV v4.11.3-8 VM with reenlightenment flag is trying to run only on the nodes with appropriate tsc-frequency or on the nodes with tsc-scalable=true label. The only my concern - on heterogeneous cluster VM with reenlightenment flag may never run on specific nodes, even if I set nodeSelector explicitly. For example, we have a cluster with these nodes: > name: node01 > cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000' > cpu-timer.node.kubevirt.io/tsc-scalable: 'false' > name: node03 > cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000' > cpu-timer.node.kubevirt.io/tsc-scalable: 'false' > name: node04 > cpu-timer.node.kubevirt.io/tsc-frequency: '2095078000' > cpu-timer.node.kubevirt.io/tsc-scalable: 'true' The virt-controller finds the lowest frequency and add it to VMs, in my case it is `tsc-frequency: '1699998000'`, but since the node01 is tsc-scalable=false - VM will never try to run there. When I set this node with node-selector - the POD stuck in Pending state with message: > 0/10 nodes are available: 10 node(s) didn't match Pod's node > affinity/selector. preemption: 0/10 nodes are available: 10 Preemption > is not helpful for scheduling. @iholder I suppose it is expected behavior: if tsc is not scalable on the node - skip this node But what if I have a cluster where all 3 nodes non-scalable and with different tsc-freq, VM with reenlightenment (or with invtsc) will run only on one node with lowest frequency?
Moving this BZ to Verified. As discussed - we should document that this is the known limitation of mixed clusters with non-scalable nodes.
When using a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.
This BZ/Limitation can be hit easily when combined with BZ2184860. Essentially, a homogeneous cluster with same CPUs can be seen as heterogeneous by this logic due to small TSC calibration variances on tsc-scale=false nodes. The system should do better to attempt to start the VMs, confining VMs only to lowest frequency CPUs should not happen. It may prevent users from starting important VMs if the VM cannot schedule on them for whatever reason. Or not spread the workload properly for example. It is more important to start the VM if the user requests it. If it cannot migrate later due to a different frequency that is a lower priority issue. Upon fresh start, any frequency is fine. It should run the VM as number 1 priority if a user wants to start a VM.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.11.6 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:5103
Removed this bug from the 4.14 release notes Known issue list and added it as a bug fix.