+++ This bug was initially created as a clone of Bug #1813479 +++
Description of problem:
openshift-dns daemonset doesn't include toleration to run on nodes with taints. After a NoSchedule taint is configured for a node, the daemonset stops managing the pods on that node and 2 things happen:
- alerts are shown in OCP dashboard: Pods of DaemonSet openshift-dns/dns-default are running where they are not supposed to run.
- if the pods are deleted on nodes with taint, they won't be recovered.
Version-Release number of selected component (if applicable):
Whenever taints are applied to nodes.
Steps to Reproduce:
1. "oc -n openshift-dns get ds" to check desired nodes for the ds.
2. Apply NoSchedule taint to node
3. "oc -n openshift-dns get ds" to check that desired count has less one node.
4. Observe alerts on OCP dashboard
5. "oc -n openshift-dns get pods -o wide" to verify that pods are still running on tainted node
openshift-dns pods stop being managed by daemonset on nodes with a taint.
openshift-dns should continue to be managed by daemonset and have pods running on every node.
This change might be related to the issue.
A PR is posted and awaiting review. We'll try to get it merged next sprint.
*** Bug 1850464 has been marked as a duplicate of this bug. ***
The fix to the master branch has merged. We'll work on the 4.4 backport in the upcoming sprint.
Verified with 4.4.0-0.nightly-2020-07-24-031753 and the issue has been fixed.
The dns pod can be running on nodes with a taint.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.4.15 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.