Bug 1861906 - virt-handler pod does not have an "operator exists" toleration.
Summary: virt-handler pod does not have an "operator exists" toleration.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.3.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 2.5.0
Assignee: sgott
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-29 21:05 UTC by Jason Huddleston
Modified: 2024-06-13 22:50 UTC (History)
10 users (show)

Fixed In Version: 2.5.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-07 09:52:30 UTC
Target Upstream Version:
Embargoed:
dcritch: needinfo-


Attachments (Terms of Use)
Usage of tolerance and node selector on VM YAML (1.19 KB, text/plain)
2020-08-10 17:37 UTC, Igor Bezukh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 4092 0 None closed Incorporate HCO's nodeselectors logic into KubeVirt 2021-02-09 14:46:19 UTC
Github kubevirt kubevirt pull 4225 0 None closed Test IDs for Node Placement tests 2021-02-09 14:46:19 UTC

Description Jason Huddleston 2020-07-29 21:05:50 UTC
Description of problem:

virt-handler pod does not have an operator exists toleration to allow taints on nodes like other system components.

Version-Release number of selected component (if applicable):

Openshift 4.4
CNV 2.3

How reproducible:

Very

Steps to Reproduce:
1. Add Hyperconverged CNV deployment to Openshift 4.4 cluster from the cnv operator
2. Add Taint to a worker
3. Restart worker

Actual results:

Virt-handler pod will not be schedulable on worker after taint is applied.

Expected results:

Virt-handler pod should have operator exists toleration to run on tainted nodes like other system components.  

Below is an example from tuned:

tolerations:
    - operator: Exists




Additional info:

Comment 1 sgott 2020-07-29 22:01:28 UTC
"operator: exists" is an operand. It basically tells OpenShift how a pod should respond when a taint is applied. What we really need to know here is what's the key/effect of the taint being applied:

https://docs.openshift.com/container-platform/3.6/admin_guide/scheduling/taints_tolerations.html

That said, KubeVirt components tolerate the CriticalAddonsOnly/Unschedulable taint.

Comment 10 Igor Bezukh 2020-08-10 17:37:51 UTC
Created attachment 1710998 [details]
Usage of tolerance and node selector on VM YAML

Comment 14 Igor Bezukh 2020-08-23 07:14:41 UTC
IIUC the fix for this bug is the completion of https://issues.redhat.com/browse/CNV-5974.

David, Jason, can you please confirm?

Comment 15 David Critch 2020-08-24 19:53:02 UTC
CNV-5974 addresses placement. This BZ is opened because the virt-handler pods are not tolerating the taint applied to the nodes. They are similar, but slightly different issues.

If the placement API includes setting tolerations on the pod, then in that case it would cover it.

Comment 16 Igor Bezukh 2020-08-25 08:37:25 UTC
OK now I am confused because I do able to make virt-handler to tolerate nodes with a taint. Also the toleration will work if I will use the Exists operator in virt-handler spec.

Comment 17 David Critch 2020-08-25 15:35:40 UTC
Right. We can patch the daemonset right now to tolerate the taint, but that is essentially due to a bug[0]. If that bug is fixed, we'll need to make sure it either includes that toleration built-in, or can be configured by an API.


[0] https://bugzilla.redhat.com/show_bug.cgi?id=1868099

Comment 21 Kedar Bidarkar 2020-10-19 17:29:04 UTC
Added the below taint to the node.

]$ oc adm taint node kbid25vrm-hnmbm-worker-0-gshsn worker=load-balancer:NoSchedule

[kbidarka@localhost cnv-tests]$ oc get nodes -o yaml | grep -A 3 taints
taints:
    - effect: NoSchedule
      key: worker
      value: load-balancer

Updated the hyperconverged CR with the Tolerations

[kbidarka@localhost cnv-tests]$ oc get hyperconverged -n openshift-cnv -o yaml | grep -A 5 workloads
    workloads:
      nodePlacement:
        tolerations:
        - effect: NoSchedule
          key: worker
          operator: Exists

The virt-handler Daemonset got updated with the Tolerations.

[kbidarka@localhost cnv-tests]$ oc get daemonset virt-handler -n openshift-cnv  -o yaml | grep -A 6 "tolerations:"
tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: worker
        operator: Exists

Comment 22 Kedar Bidarkar 2020-10-19 17:36:19 UTC
I tainted only 1 node.

Also, see 3 pods running on each Worker node of a 3 worker nodes setup.

[kbidarka@localhost cnv-tests]$ oc get pods -n openshift-cnv | grep virt-handler 
virt-handler-2g46b                                    1/1       Running        0          112m
virt-handler-wkztj                                    1/1       Running        0          113m
virt-handler-zb8j8                                    1/1       Running        0          113m

[kbidarka@localhost cnv-tests]$ oc get nodes
NAME                             STATUS    ROLES     AGE       VERSION
kbid25vrm-hnmbm-master-0         Ready     master    12d       v1.19.0+db1fc96
kbid25vrm-hnmbm-master-1         Ready     master    12d       v1.19.0+db1fc96
kbid25vrm-hnmbm-master-2         Ready     master    12d       v1.19.0+db1fc96
kbid25vrm-hnmbm-worker-0-c298l   Ready     worker    12d       v1.19.0+db1fc96
kbid25vrm-hnmbm-worker-0-gshsn   Ready     worker    12d       v1.19.0+db1fc96
kbid25vrm-hnmbm-worker-0-t8644   Ready     worker    12d       v1.19.0+db1fc96


Note You need to log in before you can comment on or make changes to this bug.