Bug 1906254
Summary: | MCDDrainError firing on node.kubernetes.io/unschedulable toleration contention | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | bembery |
Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
Status: | CLOSED DUPLICATE | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.5 | CC: | kgarriso, travi, wking |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-15 00:17:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
bembery
2020-12-10 02:41:51 UTC
I'm fuzzy on how this gets up into the alert, but the underlying issue seems to be a fight going on between: * the MCD trying to drain the CVO by setting node.kubernetes.io/unschedulable and then repeatedly killing CVO pods, while * the CVO's ReplicaSet controller gamely creates replacements which tolerate unschedulable (background on why years ago in [1]). I don't understand taints and tolerations well enough [2], but if there's a way to say "We'd prefer the CVO to be scheduled on a node that doesn't have the node.kubernetes.io/unschedulable taint, but if going onto a tainted node is the only way to get scheduled, we'll accept that too", that seems like it would at least mitigate the contention. [1]: https://github.com/openshift/cluster-version-operator/pull/182#discussion_r280948358 [2]: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ The unschedulable toleration does not make the cut in [1]. Auditing a recent 4.7 nightly [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.7/1336632578890797056/artifacts/e2e-aws/pods.json | jq -r '.items[] | .metadata as $m | .spec.tolerations[] | select(.key == "node.kubernetes.io/unschedulable") | $m.namespace + " " + $m.name + " " + (. | tostring)' openshift-cluster-version cluster-version-operator-c4dbbfcbb-vz27g {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-ingress-canary ingress-canary-4sxd6 {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-ingress-canary ingress-canary-j57xq {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-ingress-canary ingress-canary-lpgbd {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-machine-config-operator machine-config-server-54kkg {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-machine-config-operator machine-config-server-lnssm {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-machine-config-operator machine-config-server-w57mj {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-multus multus-admission-controller-5q62c {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-multus multus-admission-controller-78px4 {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-multus multus-admission-controller-vckmv {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-sdn sdn-controller-msdtk {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-sdn sdn-controller-x26hs {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} openshift-sdn sdn-controller-znqlz {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} [1]: https://github.com/openshift/enhancements/blame/94baf7dd83a909d04a00a99c117bdf90e53c5e63/CONVENTIONS.md#L165-L173 [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.7/1336632578890797056 MCD doesn't drain DaemonSet pods, because they can't get rescheduled on an alternate node, so here's a better audit: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.7/1336632578890797056/artifacts/e2e-aws/pods.json | jq -r '.items[] | .metadata as $m | .spec.tolerations[] | select(.key == "node.kubernetes.io/unschedulable") | $m.namespace + " " + $m.name + " " + ($m.ownerReferences[].kind | tostring) + " " + (. | tostring)' | grep -v DaemonSet openshift-cluster-version cluster-version-operator-c4dbbfcbb-vz27g ReplicaSet {"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","operator":"Exists"} Shows that this is just MCD vs. CVO. Thanks for all the details Trevor, will look into this. Closing this for now as the immediate bug would not occur. We have made improvements to drain logic such that this alert would not have fired. In 4.7+ drain timeouts are now an hour (your drain took 350s total): See: https://github.com/openshift/machine-config-operator/pull/2605 *** This bug has been marked as a duplicate of bug 1968759 *** |