After applying workload tolerations to the hco - it was applied to virt-handler pods, but was not applied to virt-launcher pods. As result - no any VMs can run on tainted nodes # tolerations in Kubevirt CR: $ oc get kubevirt -n openshift-cnv -o json | jq .items[0].spec.workloads { "nodePlacement": { "tolerations": [ { "effect": "NoSchedule", "key": "key1", "operator": "Equal", "value": "value1" } ] } } # virt-handler pod has same tolerations and can succesfully run on specified nodes $ oc get pod -n openshift-cnv virt-handler-zlm4f -o json | jq .spec.tolerations [ . { "effect": "NoSchedule", "key": "key1", "operator": "Equal", "value": "value1" }, # virt-launcher pods does not have same tolerations so can't run on that nodes: $ oc describe pod . Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 10s default-scheduler 0/3 nodes are available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
From looking at the code, it seems pretty clear that indeed Kubevirt CR's nodePlacement affects core components only (e.g. virt-handler, virt-controller, etc) but does not affect virt-launcher pods. However, I don't see a clear evidence that this field was meant to affect virt-launcher pods. The upstream documentation [1] is pretty vague, saying "nodePlacement describes scheduling configuration for specific KubeVirt components". The doc for Replicas field [2] (which is another co-field in the same ComponentConfig struct) says: "replicas indicates how many replicas should be created for each KubeVirt infrastructure component (like virt-api or virt-controller)". Are we sure this field is supposed to affect virt-launchers? Was it documented somewhere? Was it asked for by some user? Is there any clear use-case here? I would also like to note that we already have a NodeSelector field in Kubevirt CR [3] that can set node-selectors to virt-launchers. So unless I'm missing something, I think this can be closed as not a bug. However, we can indeed improve the documentation. [1] https://github.com/kubevirt/kubevirt/blob/v1.0.0/staging/src/kubevirt.io/api/core/v1/componentconfig.go#L39 [2] https://github.com/kubevirt/kubevirt/blob/v1.0.0/staging/src/kubevirt.io/api/core/v1/componentconfig.go#L47 [3] https://github.com/kubevirt/kubevirt/blob/v1.0.0/staging/src/kubevirt.io/api/core/v1/types.go#L2543
Thanks Itamar, I think you are correct - workloads tolerations affects virt-handler pods only, for VMs we have to set tolerations in the VM spec: https://docs.openshift.com/container-platform/4.13/virt/virtual_machines/advanced_vm_management/virt-specifying-nodes-for-vms.html#virt-example-vm-node-placement-tolerations_virt-specifying-nodes-for-vms Btw, example in the doc is not working. The toleration should be set under .spec.template.spec, so it should be: metadata: name: example-vm-tolerations apiVersion: kubevirt.io/v1 kind: VirtualMachine spec: template: spec: tolerations: - key: "key" operator: "Equal" value: "virtualization" effect: "NoSchedule" I think we can close this bug as not a bug, just need to update example in the documentation
Thanks Denys! While I agree this is not a bug, I also think that you have a valid point that both the documentation and the API is not very clear. What concerns me here is that with "nodePlacement" you can make core components schedule on tainted nodes. So, virt-handler would be scheduled there which will cause a `kubevirt.io/schedulable` label to appear on that node. But to actually schedule VMs (i.e. virt-launcher pods) to that node the user would need to also add a toleration on the VM level. This is somewhat related to the discussion that took place here: https://github.com/kubevirt/kubevirt/pull/10169#issuecomment-1651537898. Perhaps an issue can be opened to discuss this although this is not a bug. Thank you!