Bug 1900322 - metal3 pod's toleration for key: node-role.kubernetes.io/master currently matches on exact value matches but should match on Exists
Summary: metal3 pod's toleration for key: node-role.kubernetes.io/master currently mat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Robin Cernin
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-22 08:42 UTC by Andreas Karis
Modified: 2023-12-15 20:08 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously value of the metal3 pod's NoSchedule toleration matched exactly on value "true". Now NoSchedule toleration uses Exist operator which makes the NoSchedule toleration match on any value. This removes any confusion for the operator who needs to set NoSchedule toleration.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:35:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-baremetal-operator pull 96 0 None closed Bug 1900322: Add Toleration OP Exists to metal3 pod 2021-01-24 23:33:32 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:35:55 UTC

Description Andreas Karis 2020-11-22 08:42:45 UTC
Description of problem:

> This may be fixed in a later version. I ran across this on a customer environment in 4.4 and do not have a test system with metal3 available, at the moment.

The metal3 pod's toleration currently matches on exact value matches.

https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
~~~
A toleration "matches" a taint if the keys are the same and the effects are the same, and:

    the operator is Exists (in which case no value should be specified), or
    the operator is Equal and the values are equal.
~~~

However, it should match on "operator: Exists", the same as the vast majority of our pods which are allowed to run on unschedulable masters. 
~~~
[kni@provisioner ~]$ oc get pod -n openshift-machine-api metal3-7d8bdb796d-wpt4h  -o yaml | grep -i tolera -A20
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 120
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 120
~~~

Vs from a lab that shows that we usually match on exists:
~~~
[akaris@linux sriov-network-operator]$ oc get pods -A -o wide | grep ip-10-0-133-15.eu-west-1.compute.internal | grep Running | awk '{print $1 " " $2}' | while read a b ; do echo === $a/$b === ; oc get pod -n $a $b -o yaml | grep 'key: node-role.kubernetes.io/master' -C1; done
=== openshift-apiserver-operator/openshift-apiserver-operator-7546b84744-b55ms ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-apiserver/apiserver-7c85b978fd-n8h8d ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-authentication-operator/authentication-operator-849d6b8888-lgn5h ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-authentication/oauth-openshift-56cd58fcbf-drxgv ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-cluster-machine-approver/machine-approver-58fc6999c-lmqdp ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-cluster-node-tuning-operator/cluster-node-tuning-operator-7cf7b68cff-7jxf9 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-cluster-node-tuning-operator/tuned-xfmlt ===
=== openshift-cluster-version/cluster-version-operator-5f4d94dcd9-vpv6d ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-console/console-5c7fd94d5d-gzb4s ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-controller-manager-operator/openshift-controller-manager-operator-6f95cb6dff-cx2s6 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-controller-manager/controller-manager-hnz5h ===
=== openshift-dns-operator/dns-operator-69b6698b4c-x4sqq ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-dns/dns-default-4fhhx ===
=== openshift-etcd-operator/etcd-operator-7f5bcbf444-nd5zj ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-etcd/etcd-ip-10-0-133-15.eu-west-1.compute.internal ===
=== openshift-image-registry/node-ca-n8jzl ===
=== openshift-kube-apiserver-operator/kube-apiserver-operator-7bb7f6c9db-h7h57 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-kube-apiserver/kube-apiserver-ip-10-0-133-15.eu-west-1.compute.internal ===
=== openshift-kube-controller-manager-operator/kube-controller-manager-operator-66c98959c7-4d928 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-kube-controller-manager/kube-controller-manager-ip-10-0-133-15.eu-west-1.compute.internal ===
=== openshift-kube-scheduler-operator/openshift-kube-scheduler-operator-6c7f76d7b4-l9hxf ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-133-15.eu-west-1.compute.internal ===
=== openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator-88df9db45-f5mcg ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-machine-config-operator/etcd-quorum-guard-798955868-jwvfl ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-machine-config-operator/machine-config-daemon-f2xrp ===
=== openshift-machine-config-operator/machine-config-operator-5cdf6fdfdf-l6hp7 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-machine-config-operator/machine-config-server-rzk76 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-monitoring/node-exporter-62hh2 ===
=== openshift-multus/multus-admission-controller-4stbg ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-multus/multus-x8ms7 ===
=== openshift-network-operator/network-operator-7c67d58b9b-nrvt7 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-operator-lifecycle-manager/catalog-operator-7fdbcccd94-8fbp9 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-operator-lifecycle-manager/olm-operator-69bc9b8675-stjkv ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-ovn-kubernetes/ovnkube-master-b2hft ===
  tolerations:
  - key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-ovn-kubernetes/ovnkube-node-jwrdt ===
=== openshift-ovn-kubernetes/ovs-node-zsrfw ===
=== openshift-service-ca-operator/service-ca-operator-648466c4f4-7w6rm ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator-5bfc4645f5-kxcc7 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator-89488bqc8 ===
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
=== openshift-storage/csi-cephfsplugin-f9zht ===
=== openshift-storage/csi-rbdplugin-c5h8p ===
~~~

We have a customer who modified their master node taint to:
~~~
  name: openshift-master-1
  resourceVersion: "24746231"
  selfLink: /api/v1/nodes/openshift-master-1
  uid: efec3896-1250-4b42-be13-dadcd0493479
spec:
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    value: "true"
status:
  addresses:
~~~

It's subtle, but the default is:
~~~
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
~~~

After the customer added `value: true` to the taint of their 3 master nodes, metal3 could not be scheduled on the masters. I agree that administrators should not change the taint, but the vast majority of our pods have a toleration for key existence, not for exact value match, and metal3 should have the same behavior.

Otherwise, it will match on the exact value of node-role.kubernetes.io/master. That's why "value: true" stopped the metal3 pod from working:
~~~
77m         Warning   FailedScheduling         pod/machine-api-controllers-7f794c7b-stlf6         0/8 nodes are available: 1 node(s) were unschedulable, 3 node(s) had taints that the pod didn't tolerate, 4 node(s) didn't match node selector
~~~

Comment 6 errata-xmlrpc 2021-02-24 15:35:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.