Description of problem: Add tolerations in node placement rules as per https://docs.openshift.com/container-platform/4.12/virt/install/virt-specifying-nodes-for-virtualization-components.html. These rules are not getting propagated to the kubevirt-xxx-jobxxx pod. When all nodes have taints set, the pod will be stuck in "Pending" status and virt-oprator will fail to deploy virt components. ~~~ 2023-05-02T17:48:15.692306010Z {"component":"virt-operator","kind":"","level":"error","msg":"Waiting on install strategy to be posted from job kubevirt-6dd2f5f71de2802413f34578306382b99a65c12f-jobk8c54","name":"kubevirt-6dd2f5f71de2802413f34578306382b99a65c12f-jobk8c54","namespace":"openshift-cnv","pos":"kubevirt.go:948","timestamp":"2023-05-02T17:48:15.692261Z","uid":"2a2df195-2806-4d1a-a5d6-f135f56f44bb"} ~~~ Version-Release number of selected component (if applicable): OpenShift Virtualization 4.12.2 How reproducible: 100% Steps to Reproduce: 1. Add node placement rules on hco object. 2. Check the placement rules are applied to kubevirt-xxx-jobxxx pod. Actual results: kubevirt-job pod ignores node placement configuration Expected results: Node placement configuration should be propagated to kubevirt-job pod. Additional info: This looks to be already fixed upstream https://github.com/kubevirt/kubevirt/commit/166f5063d1cc839fa1811871418cc528e829ca94
I verified on CNV-v4.12.4-52 - it seems not everything works as expected. 2 questions to clarify: 1. .spec.infra.nodePlacement.tolerations successfully applied to virt-controller and virt-api, but was *not applied to virt-operator* steps: 1) set taints on the nodes: $ oc adm taint nodes cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com key1=value1:NoSchedule 2) added tolerations to HCO infra: nodePlacement: tolerations: - effect: NoSchedule key: key1 operator: Equal value: value1 3) check it was propagated to kubevirt: $ oc get kubevirt -o json | jq .items[0].spec.infra { "nodePlacement": { "tolerations": [ { "effect": "NoSchedule", "key": "key1", "operator": "Equal", "value": "value1" } ] } } 4) check virt pods: VIRT-API and VIRT-CONTROLLER pods have it: $oc get pod virt-api-769645b799-2z5kp -o json | jq .spec.tolerations { "effect": "NoSchedule", "key": "key1", "operator": "Equal", "value": "value1" }, VIRT-OPERATOR - does not have it: $ oc get pod virt-operator-6c675b7888-9vlsx -o json | jq .spec.tolerations [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoExecute", "key": "node.kubernetes.io/not-ready", "operator": "Exists", "tolerationSeconds": 300 }, { "effect": "NoExecute", "key": "node.kubernetes.io/unreachable", "operator": "Exists", "tolerationSeconds": 300 }, { "effect": "NoSchedule", "key": "node.kubernetes.io/memory-pressure", "operator": "Exists" } ] As result, if I remove virt-operator pod - it will be re-created and stuck in Pending state: virt-operator-6c675b7888-rt8sj 0/1 Pending 0 40s Probably virt-operator pods should also include necessary tolerations 2. And another question about .spec.workloads When I set workloads tolerations to the hco - it is applied to virt-handler pods, but does not apply to virt-launcher pods, so no any VMs can be created on that node $ oc describe pod virt-launcher-vm-label-mdfqt . Warning FailedScheduling 22m default-scheduler 0/3 nodes are available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
(In reply to Denys Shchedrivyi from comment #1) > I verified on CNV-v4.12.4-52 - it seems not everything works as expected. 2 > questions to clarify: > > 1. .spec.infra.nodePlacement.tolerations successfully applied to > virt-controller and virt-api, but was *not applied to virt-operator* > > steps: > 1) set taints on the nodes: > $ oc adm taint nodes cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com > key1=value1:NoSchedule > > 2) added tolerations to HCO > infra: > nodePlacement: > tolerations: > - effect: NoSchedule > key: key1 > operator: Equal > value: value1 > > 3) check it was propagated to kubevirt: > $ oc get kubevirt -o json | jq .items[0].spec.infra > { > "nodePlacement": { > "tolerations": [ > { > "effect": "NoSchedule", > "key": "key1", > "operator": "Equal", > "value": "value1" > } > ] > } > } > > 4) check virt pods: > VIRT-API and VIRT-CONTROLLER pods have it: > $oc get pod virt-api-769645b799-2z5kp -o json | jq .spec.tolerations > { > "effect": "NoSchedule", > "key": "key1", > "operator": "Equal", > "value": "value1" > }, > > VIRT-OPERATOR - does not have it: > $ oc get pod virt-operator-6c675b7888-9vlsx -o json | jq > .spec.tolerations > [ > { > "key": "CriticalAddonsOnly", > "operator": "Exists" > }, > { > "effect": "NoExecute", > "key": "node.kubernetes.io/not-ready", > "operator": "Exists", > "tolerationSeconds": 300 > }, > { > "effect": "NoExecute", > "key": "node.kubernetes.io/unreachable", > "operator": "Exists", > "tolerationSeconds": 300 > }, > { > "effect": "NoSchedule", > "key": "node.kubernetes.io/memory-pressure", > "operator": "Exists" > } > ] > > As result, if I remove virt-operator pod - it will be re-created and stuck > in Pending state: > virt-operator-6c675b7888-rt8sj 0/1 Pending > 0 40s > > Probably virt-operator pods should also include necessary tolerations You're right, but I think that's trickier than expected as KubeVirt starts "living" when virt-operator starts running, so by that time virt-operator pods have already been scheduled and placed onto nodes (before the KubeVirt CR is created). If we want to do this then HCO needs to install the virt-operator with the scheduling hints already in place in its deployment spec, so I guess a different HCO bug must be filed for this to happen. > > > > 2. And another question about .spec.workloads > When I set workloads tolerations to the hco - it is applied to virt-handler > pods, but does not apply to virt-launcher pods, so no any VMs can be created > on that node > $ oc describe pod virt-launcher-vm-label-mdfqt > . > Warning FailedScheduling 22m default-scheduler 0/3 nodes are > available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3 > nodes are available: 3 Preemption is not helpful for scheduling. This sounds like a bug, though a different one. In terms of this bug, I'd say everything works as expected actually even with the shortcomings you highlighted.
Based on commment #2 closing this bug For virt-launcher pods opened new one: bug 2216276
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 4.12.4 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:3889