1859685 – openshift-dns daemonset doesn't include toleration to run on nodes with taints [4.3.x]

Bug 1859685 - openshift-dns daemonset doesn't include toleration to run on nodes with taints [4.3.x]

Summary: openshift-dns daemonset doesn't include toleration to run on nodes with taint...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.3.z
Assignee:	Daneyon Hansen
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:	1847197
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-22 17:04 UTC by Courtney Ruhm
Modified:	2023-12-15 18:33 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-19 11:10:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-dns-operator pull 186	0	None	closed	[release-4.3] Bug 1859685: Tolerate all taints	2021-01-15 05:30:03 UTC
Red Hat Product Errata	RHBA-2020:3259	0	None	None	None	2020-08-19 11:10:50 UTC

Description Courtney Ruhm 2020-07-22 17:04:49 UTC

This bug was initially created as a copy of Bug #1847197

I am copying this bug because: This bug was fixed in 4.5 and a backport is approved for 4.4. But Customer is running 4.3 cluster with OCS. Customer does not wish to upgrade past 4.3 because of issues with OCS in 4.4 and above.The customer is wondering about a 4.3 backport 



+++ This bug was initially created as a clone of Bug #1813479 +++

Description of problem:
openshift-dns daemonset doesn't include toleration to run on nodes with taints. After a NoSchedule taint is configured for a node, the daemonset stops managing  the pods on that node and 2 things happen:

- alerts are shown in OCP dashboard: Pods of DaemonSet openshift-dns/dns-default are running where they are not supposed to run.

- if the pods are deleted on nodes with taint, they won't be recovered.

Version-Release number of selected component (if applicable):
OCP 4.3.x

How reproducible:
Whenever taints are applied to nodes.

Steps to Reproduce:
1. "oc -n openshift-dns get ds" to check desired nodes for the ds.
2. Apply NoSchedule taint to node
3. "oc -n openshift-dns get ds" to check that desired count has less one node.
4. Observe alerts on OCP dashboard
5. "oc -n openshift-dns get pods -o wide" to verify that pods are still running on tainted node


Actual results:
openshift-dns pods stop being managed by daemonset on nodes with a taint.


Expected results:
openshift-dns should continue to be managed by daemonset and have pods running on every node.

Additional info:

This change[1] might be related to the issue.

[1] https://github.com/openshift/cluster-dns-operator/commit/6be3d017118b89203f00b9a915ffdfdb9975f145

Comment 2 Andrew McDermott 2020-07-23 16:04:25 UTC

We have a fix for 4.4 -- https://github.com/openshift/cluster-dns-operator/pull/179 -- will look at a backport for 4.3.

Comment 4 Daneyon Hansen 2020-07-24 17:10:12 UTC

Waiting for https://bugzilla.redhat.com/show_bug.cgi?id=1847197 to be marked "Verified" and then /bugzilla refresh https://github.com/openshift/cluster-dns-operator/pull/186

Comment 5 Andrew McDermott 2020-07-30 09:57:54 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 8 Hongan Li 2020-08-11 03:07:11 UTC

Verified with 4.3.0-0.nightly-2020-08-07-040203 and issue has been fixed.

The dns pod can be running on the node with a taint.

Comment 10 errata-xmlrpc 2020-08-19 11:10:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.3.33 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3259

Note You need to log in before you can comment on or make changes to this bug.