2121631 – Hyper-V enlightenments on NodeSelector makes event message useless to customer if node has no kvm

Bug 2121631 - Hyper-V enlightenments on NodeSelector makes event message useless to customer if node has no kvm

Summary: Hyper-V enlightenments on NodeSelector makes event message useless to custome...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	4.10.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	sgott
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-26 05:16 UTC by Germano Veit Michel
Modified:	2022-08-31 20:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-31 20:22:05 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Germano Veit Michel 2022-08-26 05:16:40 UTC

Description of problem:

If one has worker nodes without vmx/svm and launch a Windows VM, the virt-laucher pod fails to schedule with:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-08-26T04:59:59Z"
    message: '0/7 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master:
      }, that the pod didn''t tolerate, 4 node(s) didn''t match Pod''s node affinity/selector.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled

If its a Linux VM the message is actually useful to the customer:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-08-26T05:09:52Z"
    message: '0/7 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master:
      }, that the pod didn''t tolerate, 4 Insufficient devices.kubevirt.io/kvm.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled

This is because of the nodeSelector of the Windows VM has these:

  nodeSelector:
    hyperv.node.kubevirt.io/frequencies: "true"
    hyperv.node.kubevirt.io/ipi: "true"
    hyperv.node.kubevirt.io/reenlightenment: "true"
    hyperv.node.kubevirt.io/reset: "true"
    hyperv.node.kubevirt.io/runtime: "true"
    hyperv.node.kubevirt.io/synic: "true"
    hyperv.node.kubevirt.io/synictimer: "true"
    hyperv.node.kubevirt.io/tlbflush: "true"
    hyperv.node.kubevirt.io/vpindex: "true"
    kubevirt.io/schedulable: "true"

Which are not present on the node object if the node does not have virtualization:

So it does not match the node, and fails before checking if the resource kvm is available for use which would in turn produce an useful error event.

This sort of basic problem must be clear to the user with proper event messages.

Version-Release number of selected component (if applicable):
CNV 4.10.4
OCP 4.10.26

How reproducible:
Always

Steps to Reproduce:
1. Disable VMX on Worker Nodes
2. Start a Linux and a Windows VM
3. Windows VM fails to start with unclear message

Comment 2 sgott 2022-08-31 20:22:05 UTC

I agree that the end user experience of this is not ideal, but pods (and their status conditions) are handled entirely by the kube scheduler. I have opened a bug in the OCP project to track this. closing this BZ as deferred.

https://issues.redhat.com/browse/OCPBUGS-777

Note You need to log in before you can comment on or make changes to this bug.