Bug 2182939 - topologyHints with tscFrequency is set on the VMI when starting a VM from cold, preventing scheduling on some nodes due to NodeSelector
Summary: topologyHints with tscFrequency is set on the VMI when starting a VM from col...
Keywords:
Status: CLOSED DUPLICATE of bug 2184860
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.12.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.14.0
Assignee: sgott
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-30 01:17 UTC by Germano Veit Michel
Modified: 2023-04-06 01:57 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-04-06 01:57:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-27577 0 None None None 2023-03-30 01:20:49 UTC
Red Hat Knowledge Base (Solution) 7005589 0 None None None 2023-04-02 22:11:47 UTC

Description Germano Veit Michel 2023-03-30 01:17:59 UTC
Description of problem:

1. Prepare a cluster with worked nodes with different TSC frequencies
2. Create a Windows VM, with re-enlightenment enabled
3. Start the VM
4. Note it was scheduled on node N, and labeled with the node tsc frequency of that node
   tsc-frequency-2419200000
5. Stop the VM
6. So far all ok.
7. Make all nodes with that TSC 2419200000 frequency unschedulable
8. Start the VM
9. Pod fails to schedule

Because, the VMI has:

    topologyHints:
      tscFrequency: 2419200000

Which translates to a NodeSelector on the virt-launcher

    nodeSelector:
      hyperv.node.kubevirt.io/frequencies: "true"
      hyperv.node.kubevirt.io/ipi: "true"
      hyperv.node.kubevirt.io/reenlightenment: "true"
      hyperv.node.kubevirt.io/reset: "true"
      hyperv.node.kubevirt.io/runtime: "true"
      hyperv.node.kubevirt.io/synic: "true"
      hyperv.node.kubevirt.io/synictimer: "true"
      hyperv.node.kubevirt.io/tlbflush: "true"
      hyperv.node.kubevirt.io/vpindex: "true"
      kubevirt.io/schedulable: "true"
      scheduling.node.kubevirt.io/tsc-frequency-2419200000: "true"   <--------

The only node available has a different TSC:

% oc get nodes
NAME                STATUS                     ROLES                         AGE   VERSION
black.toca.local    Ready,SchedulingDisabled   worker                        9d    v1.25.7+eab9cc9
blue.toca.local     Ready,SchedulingDisabled   control-plane,master,worker   10d   v1.25.7+eab9cc9
green.toca.local    Ready,SchedulingDisabled   control-plane,master,worker   10d   v1.25.7+eab9cc9
indigo.toca.local   Ready,SchedulingDisabled   worker                        10d   v1.25.7+eab9cc9
red.toca.local      Ready,SchedulingDisabled   control-plane,master,worker   10d   v1.25.7+eab9cc9
violet.toca.local   Ready,SchedulingDisabled   worker                        10d   v1.25.7+eab9cc9
white.toca.local    Ready                      worker                        10d   v1.25.7+eab9cc9
yellow.toca.local   Ready,SchedulingDisabled   worker                        10d   v1.25.7+eab9cc9

% oc get nodes white.toca.local -o yaml | grep tsc-frequency
    cpu-timer.node.kubevirt.io/tsc-frequency: "2592000000"
    scheduling.node.kubevirt.io/tsc-frequency-2592000000: "true"

And no go:

      message: '0/8 nodes are available: 1 node(s) didn''t match Pod''s node affinity/selector,
        7 node(s) were unschedulable. preemption: 0/8 nodes are available: 8 Preemption
        is not helpful for scheduling.'


When starting from cold, this should not be necessary. Apparently this comes from TopologyHinter, and effectivelly concentrates new VM starts on hosts with exact same TSC of the hosts they ran previously.

Version-Release number of selected component (if applicable):
OCP 4.12.9
CNV 4.12.2

How reproducible:
Always

Steps to Reproduce:
As above

Actual results:
- VM fails to schedule on all nodes

Expected results:
- Fresh VM start can schedule on all nodes

Comment 1 Germano Veit Michel 2023-03-30 01:31:18 UTC
Actually, its not the frequency of the previous host, it seems to be the lowest in the cluster:

https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-controller/watch/topology/hinter.go#L40

Comment 4 Germano Veit Michel 2023-03-30 09:31:24 UTC
May be of interest: https://listman.redhat.com/archives/libvir-list/2020-November/msg00519.html


Note You need to log in before you can comment on or make changes to this bug.