Bug 2216276

Summary: virt-launcher pod ignores node placement configuration
Product: Container Native Virtualization (CNV) Reporter: Denys Shchedrivyi <dshchedr>
Component: VirtualizationAssignee: Itamar Holder <iholder>
Status: CLOSED NOTABUG QA Contact: Kedar Bidarkar <kbidarka>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.12.4CC: acardace
Target Milestone: ---   
Target Release: 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-06 10:15:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Denys Shchedrivyi 2023-06-20 17:17:38 UTC
After applying workload tolerations to the hco - it was applied to virt-handler pods, but was not applied to virt-launcher pods. As result - no any VMs can run on tainted nodes


# tolerations in Kubevirt CR:

$ oc get kubevirt -n openshift-cnv -o json | jq .items[0].spec.workloads
{
  "nodePlacement": {
    "tolerations": [
      {
        "effect": "NoSchedule",
        "key": "key1",
        "operator": "Equal",
        "value": "value1"
      }
    ]
  }
}


# virt-handler pod has same tolerations and can succesfully run on specified nodes

$ oc get pod -n openshift-cnv virt-handler-zlm4f -o json | jq .spec.tolerations
[
.
  {
    "effect": "NoSchedule",
    "key": "key1",
    "operator": "Equal",
    "value": "value1"
  },


# virt-launcher pods does not have same tolerations so can't run on that nodes:

$ oc describe pod
.
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  10s   default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

Comment 1 Itamar Holder 2023-08-02 07:29:49 UTC
From looking at the code, it seems pretty clear that indeed Kubevirt CR's nodePlacement affects core components only (e.g. virt-handler, virt-controller, etc) but does not affect virt-launcher pods.

However, I don't see a clear evidence that this field was meant to affect virt-launcher pods.
The upstream documentation [1] is pretty vague, saying "nodePlacement describes scheduling configuration for specific KubeVirt components".
The doc for Replicas field [2] (which is another co-field in the same ComponentConfig struct) says: "replicas indicates how many replicas should be created for each KubeVirt infrastructure component (like virt-api or virt-controller)".

Are we sure this field is supposed to affect virt-launchers? Was it documented somewhere? Was it asked for by some user? Is there any clear use-case here?

I would also like to note that we already have a NodeSelector field in Kubevirt CR [3] that can set node-selectors to virt-launchers.

So unless I'm missing something, I think this can be closed as not a bug. However, we can indeed improve the documentation.

[1] https://github.com/kubevirt/kubevirt/blob/v1.0.0/staging/src/kubevirt.io/api/core/v1/componentconfig.go#L39
[2] https://github.com/kubevirt/kubevirt/blob/v1.0.0/staging/src/kubevirt.io/api/core/v1/componentconfig.go#L47
[3] https://github.com/kubevirt/kubevirt/blob/v1.0.0/staging/src/kubevirt.io/api/core/v1/types.go#L2543

Comment 2 Denys Shchedrivyi 2023-08-02 15:08:54 UTC
 Thanks Itamar, I think you are correct - workloads tolerations affects virt-handler pods only, for VMs we have to set tolerations in the VM spec:

https://docs.openshift.com/container-platform/4.13/virt/virtual_machines/advanced_vm_management/virt-specifying-nodes-for-vms.html#virt-example-vm-node-placement-tolerations_virt-specifying-nodes-for-vms

 Btw, example in the doc is not working. The toleration should be set under .spec.template.spec, so it should be:

metadata:
  name: example-vm-tolerations
apiVersion: kubevirt.io/v1
kind: VirtualMachine
spec:
  template:
    spec:
      tolerations:
      - key: "key"
        operator: "Equal"
        value: "virtualization"
        effect: "NoSchedule"


I think we can close this bug as not a bug, just need to update example in the documentation

Comment 3 Itamar Holder 2023-08-06 10:15:13 UTC
Thanks Denys!

While I agree this is not a bug, I also think that you have a valid point that both the documentation and the API is not very clear.

What concerns me here is that with "nodePlacement" you can make core components schedule on tainted nodes. So, virt-handler would be scheduled there which will cause a `kubevirt.io/schedulable` label to appear on that node. But to actually schedule VMs (i.e. virt-launcher pods) to that node the user would need to also add a toleration on the VM level.

This is somewhat related to the discussion that took place here: https://github.com/kubevirt/kubevirt/pull/10169#issuecomment-1651537898.

Perhaps an issue can be opened to discuss this although this is not a bug.

Thank you!