Description of problem: > This may be fixed in a later version. I ran across this on a customer environment in 4.4 and do not have a test system with metal3 available, at the moment. The metal3 pod's toleration currently matches on exact value matches. https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ ~~~ A toleration "matches" a taint if the keys are the same and the effects are the same, and: the operator is Exists (in which case no value should be specified), or the operator is Equal and the values are equal. ~~~ However, it should match on "operator: Exists", the same as the vast majority of our pods which are allowed to run on unschedulable masters. ~~~ [kni@provisioner ~]$ oc get pod -n openshift-machine-api metal3-7d8bdb796d-wpt4h -o yaml | grep -i tolera -A20 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master - key: CriticalAddonsOnly operator: Exists - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 120 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 120 ~~~ Vs from a lab that shows that we usually match on exists: ~~~ [akaris@linux sriov-network-operator]$ oc get pods -A -o wide | grep ip-10-0-133-15.eu-west-1.compute.internal | grep Running | awk '{print $1 " " $2}' | while read a b ; do echo === $a/$b === ; oc get pod -n $a $b -o yaml | grep 'key: node-role.kubernetes.io/master' -C1; done === openshift-apiserver-operator/openshift-apiserver-operator-7546b84744-b55ms === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-apiserver/apiserver-7c85b978fd-n8h8d === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-authentication-operator/authentication-operator-849d6b8888-lgn5h === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-authentication/oauth-openshift-56cd58fcbf-drxgv === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-cluster-machine-approver/machine-approver-58fc6999c-lmqdp === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-cluster-node-tuning-operator/cluster-node-tuning-operator-7cf7b68cff-7jxf9 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-cluster-node-tuning-operator/tuned-xfmlt === === openshift-cluster-version/cluster-version-operator-5f4d94dcd9-vpv6d === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-console/console-5c7fd94d5d-gzb4s === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-controller-manager-operator/openshift-controller-manager-operator-6f95cb6dff-cx2s6 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-controller-manager/controller-manager-hnz5h === === openshift-dns-operator/dns-operator-69b6698b4c-x4sqq === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-dns/dns-default-4fhhx === === openshift-etcd-operator/etcd-operator-7f5bcbf444-nd5zj === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-etcd/etcd-ip-10-0-133-15.eu-west-1.compute.internal === === openshift-image-registry/node-ca-n8jzl === === openshift-kube-apiserver-operator/kube-apiserver-operator-7bb7f6c9db-h7h57 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-kube-apiserver/kube-apiserver-ip-10-0-133-15.eu-west-1.compute.internal === === openshift-kube-controller-manager-operator/kube-controller-manager-operator-66c98959c7-4d928 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-kube-controller-manager/kube-controller-manager-ip-10-0-133-15.eu-west-1.compute.internal === === openshift-kube-scheduler-operator/openshift-kube-scheduler-operator-6c7f76d7b4-l9hxf === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-kube-scheduler/openshift-kube-scheduler-ip-10-0-133-15.eu-west-1.compute.internal === === openshift-kube-storage-version-migrator-operator/kube-storage-version-migrator-operator-88df9db45-f5mcg === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-machine-config-operator/etcd-quorum-guard-798955868-jwvfl === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-machine-config-operator/machine-config-daemon-f2xrp === === openshift-machine-config-operator/machine-config-operator-5cdf6fdfdf-l6hp7 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-machine-config-operator/machine-config-server-rzk76 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-monitoring/node-exporter-62hh2 === === openshift-multus/multus-admission-controller-4stbg === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-multus/multus-x8ms7 === === openshift-network-operator/network-operator-7c67d58b9b-nrvt7 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-operator-lifecycle-manager/catalog-operator-7fdbcccd94-8fbp9 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-operator-lifecycle-manager/olm-operator-69bc9b8675-stjkv === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-ovn-kubernetes/ovnkube-master-b2hft === tolerations: - key: node-role.kubernetes.io/master operator: Exists === openshift-ovn-kubernetes/ovnkube-node-jwrdt === === openshift-ovn-kubernetes/ovs-node-zsrfw === === openshift-service-ca-operator/service-ca-operator-648466c4f4-7w6rm === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator-5bfc4645f5-kxcc7 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator-89488bqc8 === - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists === openshift-storage/csi-cephfsplugin-f9zht === === openshift-storage/csi-rbdplugin-c5h8p === ~~~ We have a customer who modified their master node taint to: ~~~ name: openshift-master-1 resourceVersion: "24746231" selfLink: /api/v1/nodes/openshift-master-1 uid: efec3896-1250-4b42-be13-dadcd0493479 spec: taints: - effect: NoSchedule key: node-role.kubernetes.io/master value: "true" status: addresses: ~~~ It's subtle, but the default is: ~~~ taints: - effect: NoSchedule key: node-role.kubernetes.io/master ~~~ After the customer added `value: true` to the taint of their 3 master nodes, metal3 could not be scheduled on the masters. I agree that administrators should not change the taint, but the vast majority of our pods have a toleration for key existence, not for exact value match, and metal3 should have the same behavior. Otherwise, it will match on the exact value of node-role.kubernetes.io/master. That's why "value: true" stopped the metal3 pod from working: ~~~ 77m Warning FailedScheduling pod/machine-api-controllers-7f794c7b-stlf6 0/8 nodes are available: 1 node(s) were unschedulable, 3 node(s) had taints that the pod didn't tolerate, 4 node(s) didn't match node selector ~~~
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633