Bug 1942228 - DNS operator performs spurious updates in response to API's defaulting of daemonset's terminationGracePeriod and service's clusterIPs
Summary: DNS operator performs spurious updates in response to API's defaulting of dae...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: DNS
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: Miciah Dashiel Butler Masters
QA Contact: jechen
URL:
Whiteboard:
Depends On: 1936022
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-23 21:39 UTC by OpenShift BugZilla Robot
Modified: 2021-04-20 18:53 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-20 18:52:40 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-dns-operator pull 250 0 None open [release-4.7] Bug 1942228: Fix spurious reconciliation of DNS daemonset and service 2021-03-30 20:51:49 UTC
Red Hat Product Errata RHBA-2021:1149 0 None None None 2021-04-20 18:53:02 UTC

Description OpenShift BugZilla Robot 2021-03-23 21:39:10 UTC
+++ This bug was initially created as a clone of Bug #1936022 +++

Description of problem:

When the DNS operator reconciles its resources, the operator gets the DNS daemonset and service objects from the API to determine whether the operator needs to create or update these objects.  For each object, if the object does not exist, the operator creates it, and if the service does exist, the operator compares it with what the operator expects to get in order to determine whether an update is needed for that object.  In this comparison, if the API has set the default value the service's clusterIPs field or the default value for the daemonset's terminationGracePeriodSeconds field, the operator detects the update and tries to set the field field back to the empty value.  The operator should not update the daemonset or service in response to API defaulting.


Version-Release number of selected component (if applicable):

The clusterIPs field is new in Kubernetes 1.20 (OpenShift 4.7). The terminationGracePeriodSeconds field was ignored before OpenShift 4.7.  Thus versions of OpenShift before 4.7 are unaffected by this issue.  


Steps to Reproduce:

1. Launch a new cluster.  

2. Check the DNS operator's logs:

    oc -n openshift-dns-operator logs deploy/dns-operator -c dns-operator


Actual results:

The DNS operator's logs have "updated dns daemonset" and "updated dns service" repeated over and over.  In a CI run, I see over 30 occurrences of each.  


Expected results:

The DNS operator should ignore default values that the API sets and should not log "updated dns daemonset" or "updated dns service" unless the objects are updated outside of API defaulting.

Comment 3 jechen 2021-04-12 13:42:08 UTC
Verified in 4.7.0-0.nightly-2021-04-10-082109

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-04-10-082109   True        False         3m13s   Cluster version is 4.7.0-0.nightly-2021-04-10-082109

$ oc -n openshift-dns-operator logs deploy/dns-operator -c dns-operator
I0412 11:51:02.457355       1 request.go:655] Throttling request took 1.009741999s, request: GET:https://172.30.0.1:443/apis/imageregistry.operator.openshift.io/v1?timeout=32s
time="2021-04-12T11:51:06Z" level=info msg="created default dns: default"
time="2021-04-12T11:51:07Z" level=info msg="reconciling request: /default"
time="2021-04-12T11:51:07Z" level=info msg="created dns namespace: openshift-dns"
time="2021-04-12T11:51:07Z" level=info msg="created dns cluster role: /openshift-dns"
time="2021-04-12T11:51:07Z" level=info msg="created dns cluster role binding: openshift-dns"
time="2021-04-12T11:51:07Z" level=info msg="created dns service account: openshift-dns/dns"
time="2021-04-12T11:51:07Z" level=info msg="enforced finalizer for dns: default"
time="2021-04-12T11:51:07Z" level=info msg="created dns daemonset: openshift-dns/dns-default"
time="2021-04-12T11:51:07Z" level=info msg="created configmap: dns-default"
time="2021-04-12T11:51:08Z" level=info msg="created dns service: openshift-dns/dns-default"
time="2021-04-12T11:51:08Z" level=info msg="created dns metrics cluster role dns-monitoring"
time="2021-04-12T11:51:08Z" level=info msg="created dns metrics cluster role binding dns-monitoring"
time="2021-04-12T11:51:08Z" level=info msg="created dns metrics role openshift-dns/prometheus-k8s"
time="2021-04-12T11:51:08Z" level=info msg="created dns metrics role binding openshift-dns/prometheus-k8s"
time="2021-04-12T11:51:08Z" level=info msg="created servicemonitor openshift-dns/dns-default"
time="2021-04-12T11:51:08Z" level=info msg="updated DNS default status: old: v1.DNSStatus{ClusterIP:\"\", ClusterDomain:\"\", Conditions:[]v1.OperatorCondition(nil)}, new: v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63753825068, loc:(*time.Location)(0x2321dc0)}}, Reason:\"NoPodsDesired\", Message:\"No CoreDNS pods are desired (this could mean nodes are tainted)\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63753825068, loc:(*time.Location)(0x2321dc0)}}, Reason:\"AsExpected\", Message:\"All expected Nodes running DaemonSet pod and IP assigned to DNS service\"}, v1.OperatorCondition{Type:\"Available\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63753825068, loc:(*time.Location)(0x2321dc0)}}, Reason:\"DaemonSetUnavailable\", Message:\"DaemonSet pod not running on any Nodes\"}}}"


Other than a few normal "updated DNS default status" loggings from cluster starting-up process, DNS operator does not repetitively log "updated dns daemonset" or "updated dns service" as expected

Comment 5 errata-xmlrpc 2021-04-20 18:52:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.7 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1149


Note You need to log in before you can comment on or make changes to this bug.