Description of problem: virt-handler pod does not have an operator exists toleration to allow taints on nodes like other system components. Version-Release number of selected component (if applicable): Openshift 4.4 CNV 2.3 How reproducible: Very Steps to Reproduce: 1. Add Hyperconverged CNV deployment to Openshift 4.4 cluster from the cnv operator 2. Add Taint to a worker 3. Restart worker Actual results: Virt-handler pod will not be schedulable on worker after taint is applied. Expected results: Virt-handler pod should have operator exists toleration to run on tainted nodes like other system components. Below is an example from tuned: tolerations: - operator: Exists Additional info:
"operator: exists" is an operand. It basically tells OpenShift how a pod should respond when a taint is applied. What we really need to know here is what's the key/effect of the taint being applied: https://docs.openshift.com/container-platform/3.6/admin_guide/scheduling/taints_tolerations.html That said, KubeVirt components tolerate the CriticalAddonsOnly/Unschedulable taint.
Created attachment 1710998 [details] Usage of tolerance and node selector on VM YAML
IIUC the fix for this bug is the completion of https://issues.redhat.com/browse/CNV-5974. David, Jason, can you please confirm?
CNV-5974 addresses placement. This BZ is opened because the virt-handler pods are not tolerating the taint applied to the nodes. They are similar, but slightly different issues. If the placement API includes setting tolerations on the pod, then in that case it would cover it.
OK now I am confused because I do able to make virt-handler to tolerate nodes with a taint. Also the toleration will work if I will use the Exists operator in virt-handler spec.
Right. We can patch the daemonset right now to tolerate the taint, but that is essentially due to a bug[0]. If that bug is fixed, we'll need to make sure it either includes that toleration built-in, or can be configured by an API. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1868099
Added the below taint to the node. ]$ oc adm taint node kbid25vrm-hnmbm-worker-0-gshsn worker=load-balancer:NoSchedule [kbidarka@localhost cnv-tests]$ oc get nodes -o yaml | grep -A 3 taints taints: - effect: NoSchedule key: worker value: load-balancer Updated the hyperconverged CR with the Tolerations [kbidarka@localhost cnv-tests]$ oc get hyperconverged -n openshift-cnv -o yaml | grep -A 5 workloads workloads: nodePlacement: tolerations: - effect: NoSchedule key: worker operator: Exists The virt-handler Daemonset got updated with the Tolerations. [kbidarka@localhost cnv-tests]$ oc get daemonset virt-handler -n openshift-cnv -o yaml | grep -A 6 "tolerations:" tolerations: - key: CriticalAddonsOnly operator: Exists - effect: NoSchedule key: worker operator: Exists
I tainted only 1 node. Also, see 3 pods running on each Worker node of a 3 worker nodes setup. [kbidarka@localhost cnv-tests]$ oc get pods -n openshift-cnv | grep virt-handler virt-handler-2g46b 1/1 Running 0 112m virt-handler-wkztj 1/1 Running 0 113m virt-handler-zb8j8 1/1 Running 0 113m [kbidarka@localhost cnv-tests]$ oc get nodes NAME STATUS ROLES AGE VERSION kbid25vrm-hnmbm-master-0 Ready master 12d v1.19.0+db1fc96 kbid25vrm-hnmbm-master-1 Ready master 12d v1.19.0+db1fc96 kbid25vrm-hnmbm-master-2 Ready master 12d v1.19.0+db1fc96 kbid25vrm-hnmbm-worker-0-c298l Ready worker 12d v1.19.0+db1fc96 kbid25vrm-hnmbm-worker-0-gshsn Ready worker 12d v1.19.0+db1fc96 kbid25vrm-hnmbm-worker-0-t8644 Ready worker 12d v1.19.0+db1fc96