Description of problem: openshift-dns daemonset doesn't include toleration to run on nodes with taints. After a NoSchedule taint is configured for a node, the daemonset stops managing the pods on that node and 2 things happen: - alerts are shown in OCP dashboard: Pods of DaemonSet openshift-dns/dns-default are running where they are not supposed to run. - if the pods are deleted on nodes with taint, they won't be recovered. Version-Release number of selected component (if applicable): OCP 4.2.20 How reproducible: Whenever taints are applied to nodes. Steps to Reproduce: 1. "oc -n openshift-dns get ds" to check desired nodes for the ds. 2. Apply NoSchedule taint to node 3. "oc -n openshift-dns get ds" to check that desired count has less one node. 4. Observe alerts on OCP dashboard 5. "oc -n openshift-dns get pods -o wide" to verify that pods are still running on tainted node Actual results: openshift-dns pods stop being managed by daemonset on nodes with a taint. Expected results: openshift-dns should continue to be managed by daemonset and have pods running on every node. Additional info: This change[1] might be related to the issue. [1] https://github.com/openshift/cluster-dns-operator/commit/6be3d017118b89203f00b9a915ffdfdb9975f145
How do we work around that? For Machine Config Daemon DS and Node CA DS I can patch the DS definition to add the required tolerations. But the DNS operator seems to immediately "fix" the tolerations if I try that.
https://github.com/openshift/cluster-dns-operator/commit/6be3d017118b89203f00b9a915ffdfdb9975f145 It looks like in 4.3 that any taints on any of the worker nodes will cause the dns not to run on the no worker node. https://github.com/openshift/cluster-dns-operator/blob/release-4.3/assets/dns/daemonset.yaml#L151-L152 Also if any taint is added to a master that does not have the key "node-role.kubernetes.io/master" dns pods will not run on that node. I created a documentation bug asking for our docs to state the requirements for taints on masters. In my 4.2 cluster the dns has the following tolerations: tolerations: - key: node-role.kubernetes.io/master operator: Exists - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists - effect: NoSchedule key: node.kubernetes.io/disk-pressure operator: Exists - effect: NoSchedule key: node.kubernetes.io/pid-pressure operator: Exists - effect: NoSchedule key: node.kubernetes.io/unschedulable operator: Exists - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists If I set any taint NoSchedule taint on a node it will not schedule to the node. If I set any NoSchedule taint to a master that does not contain "key: node-role.kubernetes.io/master" then it will schedule to the masters. I created doc but around adding info around requirement for NoSchedule taints set on the masters. https://bugzilla.redhat.com/show_bug.cgi?id=1814325
The entire purpose of a taint is to prevent pods from being scheduled to or executed on a node[1]. Per your report, a taint with the "NoSchedule" effect prevents DNS pods from being scheduled to the tainted node. In that respect, the taint feature is behaving correctly. Why do you expect pods to be scheduled to tainted nodes? Why are you tainting nodes if you do not want the taint to have the usual effect? It might make sense to change the alerting rule to suppress the reported alerts. It is surprising that pods on tainted nodes "stop being managed". Can you clarify what you mean by "stop being managed"? Does deleting the DaemonSet or performing a rolling update fail to affect the pods? Please note that a taint with effect "NoSchedule" does not evict pods, so pods that are already scheduled and running will continue running; if you want the pods to be evicted, add a taint with the "NoExecute" effect. Please note also that DaemonSet pods automatically get the following tolerations: node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/pid-pressure:NoSchedule node.kubernetes.io/unschedulable:NoSchedule 1. https://docs.openshift.com/container-platform/4.2/nodes/scheduling/nodes-scheduler-taints-tolerations.html
Hi Miciah, The purpose of taints is understood and this isn't being questioned. What is being questioned is that configuring a taint to a worker node, dns pods won't run anymore on this node. The expectation was that dns pods should still run on nodes with taints because it's a required OCP component, similarly to what happens with internal registry and Machine Config Daemon. As openshift-dns daemonset is managed by an operator it isn't likely possible to change it's tolerations (I couldn't find a way). If it's acceptable to have dns pods running only on master nodes, in the extreme situation that all worker nodes have taints, than perhaps this isn't a bug. But I'd still consider this a bit odd, as "by default" dns pods are running everywhere, hence the expectation that they should run everywhere. I'd say this is the main issue to be clarified or discussed. Answering your specific question, by "stop being managed" I meant that rogue dns pods are left behind. I didn't try to delete the daemonset or perform a rolling update, but I manually deleted the pods on the nodes with taints and the warnings disappeared. I see that this was only required because the taint was set with effect "NoSchedule", as this was the configuration specified by OpenShift Container Storage documentation.
daemonset that are part of the cluster should always have the tolerations, so that a customer putting a taint is not going to break things. We have had so many breakages just because we have daemonsets that do not have tolerations. Another eg: https://bugzilla.redhat.com/show_bug.cgi?id=1780318
Derek, can you respond to the following? https://github.com/openshift/cluster-dns-operator/pull/171#issuecomment-622110777 > @derekwaynecarr did we get agreement to allow tolerating all taints > > Is there an enhancement / update that clarifies official stance of project on this? My takeaway from an earlier E-mail discussion was that we could go ahead and add the toleration to fix this issue, and someone (who?) in sig-node or sig-network could follow up with an enhancement proposal to add a softAdmitHandler or backpressure to the kubelet to prevent scheduling a pod before networking is ready. Can we merge #171 while that discussion happens separately?
The fix is waiting on https://github.com/openshift/enhancements/pull/335#issuecomment-632165965.
The PR is in flight and once it merges we will consider how far to backport. Moving to 4.6 as not a release blocker or an upgrade blocker for 4.5.
https://github.com/openshift/cluster-dns-operator/pull/171 actually merged. Should we move this BZ back to 4.5? @Eric - looking for guidance here.
Given I see numerous customer cases I think backporting is a good idea, yes.
Because this fix is already merged in 4.5 and comment 15 advises that it should be backported, I am moving the target release back to 4.5.0, and we can discuss how far to backport the fix. Vagner, this is fixed in 4.5; are you requesting a backport to earlier releases? If so, what is the earliest release where a backport is requested?
Hi Miciah, responding in place of Vagner, our customer are in release 4.2 and doesn't have plan to update this version.
Clearing needinfo per Remington's comment.
As of writing the fix has been tested and verified in "4.5.0-0.nightly-2020-06-09-223121" release version. It is noted that the toleration for all taints now works properly. Some excerpts from the test: * Version: ---- $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-06-09-223121 True False 5h32m Cluster version is 4.5.0-0.nightly-2020-06-09-223121 ---- * The daemonset has no additional "node-role.kubernetes.io/master" keys: ---- tolerations: - operator: Exists ---- * Post addition of "NoSchedule" taint on the worker nodes does not have any effect with the daemonset unlike the previous release version: ----- $ oc -n openshift-dns get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE dns-default 6 6 6 6 6 kubernetes.io/os=linux 5h23m $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-131-0.us-east-2.compute.internal Ready worker 4h55m v1.18.3+a637491 <-- ip-10-0-144-108.us-east-2.compute.internal Ready master 5h7m v1.18.3+a637491 ip-10-0-172-188.us-east-2.compute.internal Ready worker 4h55m v1.18.3+a637491 ip-10-0-183-160.us-east-2.compute.internal Ready master 5h7m v1.18.3+a637491 ip-10-0-203-65.us-east-2.compute.internal Ready worker 4h56m v1.18.3+a637491 ip-10-0-211-202.us-east-2.compute.internal Ready master 5h8m v1.18.3+a637491 $ oc adm taint node ip-10-0-131-0.us-east-2.compute.internal dns-taint=test:NoSchedule node/ip-10-0-131-0.us-east-2.compute.internal taint $ oc describe node ip-10-0-131-0.us-east-2.compute.internal CreationTimestamp: Wed, 10 Jun 2020 10:04:47 +0530 Taints: dns-taint=test:NoSchedule Unschedulable: false $ oc -n openshift-dns get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE dns-default 6 6 6 6 6 kubernetes.io/os=linux 5h23m $ oc -n openshift-dns describe ds dns-default Name: dns-default Selector: dns.operator.openshift.io/daemonset-dns=default Node-Selector: kubernetes.io/os=linux Labels: dns.operator.openshift.io/owning-dns=default Annotations: deprecated.daemonset.template.generation: 1 Desired Number of Nodes Scheduled: 6 Current Number of Nodes Scheduled: 6 Number of Nodes Scheduled with Up-to-date Pods: 6 Number of Nodes Scheduled with Available Pods: 6 Number of Nodes Misscheduled: 0 ----- * Reference of the same test indicating dns pod being marked out from the tainted worker node, from v4.4 release version: ----- $ oc -n openshift-dns get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE dns-default 6 6 6 6 6 kubernetes.io/os=linux 147m $ oc adm taint node ip-10-0-149-87.us-east-2.compute.internal dns-taint=test:NoSchedule node/ip-10-0-149-87.us-east-2.compute.internal tainted $ oc -n openshift-dns describe ds dns-default Name: dns-default Selector: dns.operator.openshift.io/daemonset-dns=default Node-Selector: kubernetes.io/os=linux Labels: dns.operator.openshift.io/owning-dns=default Annotations: deprecated.daemonset.template.generation: 1 Desired Number of Nodes Scheduled: 5 Current Number of Nodes Scheduled: 5 Number of Nodes Scheduled with Up-to-date Pods: 5 Number of Nodes Scheduled with Available Pods: 5 Number of Nodes Misscheduled: 1 $ oc -n openshift-dns get ds dns-default NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE dns-default 5 5 5 5 5 kubernetes.io/os=linux 160m -----
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409
Hello, I realize that this bug has been fixed in 4.5 and it blocks another bug https://bugzilla.redhat.com/show_bug.cgi?id=1847197 which tracks it being fixed in 4.4. However, it does not seem to block a 4.3 bug. My customer has asked if there is a plan to backport this to 4.3? Thanks!
I'm working on a customer problem for Red Hat OpenShift on IBM Cloud v4.3. I'll second the question regarding plans to backport to 4.3. I will also inform them this bug has been fixed in 4.4.
The 4.3 backport is covered by bug 1859685. For completeness, we have the following Bugzilla reports for this issue: • For 4.5.0, Bug 1813479 (this report). • For 4.4.z, Bug 1847197. • For 4.3.z, Bug 1859685. The way the backport process works is that the backport to version 4.y depends on the fix for version 4.(y+1), so Bug 1859685 (the 4.3.z backport) depends on Bug 1847197 (the 4.4.z backport), which depends on this report.