Bug 1842741

Summary: DNS operator performs spurious updates in response to API's defaulting of service's session affinity & type or daemonset's volumes' default modes
Product: OpenShift Container Platform Reporter: Miciah Dashiel Butler Masters <mmasters>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: DNS QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: amcdermo, aos-bugs
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When the DNS operator reconciles a DNS or Service object, the operator determines whether it needs to update the object by constructing an expected object in memory, getting the actual object from the API, and comparing the two. The operator leaves some values unspecified in its expected DNS and Service objects. When the API set default values for these unspecified values, the comparison would return a false positive. Consequence: The operator was repeatedly trying to update DNS and Service objects in response to the API's setting default values. Fix: The operator now considers unspecified values and default values to be equal when comparing DNS and Service objects. Result: The operator should no longer update a DNS or Service object in response to API defaulting.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:03:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miciah Dashiel Butler Masters 2020-06-02 02:33:09 UTC
Description of problem:

When the DNS operator reconciles the DNS, the operator gets the DNS's daemonset and service (if they exist) from the API to determine whether the operator needs to create or update them.  If the daemonset or service does not exist, the operator creates it, with empty values for some API fields, such as the spec.sessionAffinity and spec.type fields on the service.  If the daemonset or service does exist, the operator compares it with what the operator expects to get in order to determine whether an update is needed for the object.  In this comparison, if the API has set the default value for the daemonset's volumes' default mode fields or the service's spec.sessionAffinity and spec.type fields, the operator detects the update and tries to set the fields back to the empty value.  The operator should not update the daemonset or service in response to API defaulting.


Steps to Reproduce:

1. Launch a new cluster.

2. Modify the default DNS service's session affinity:

    oc -n openshift-dns patch svc/dns-default --type=strategic --patch='{"spec":{"sessionAffinity":"ClientIP"}}'

3. Check the DNS operator's logs:

    oc -n openshift-dns-operator logs deploy/dns-operator -c dns-operator


Actual results:

The DNS operator's logs repeat "updated dns service" and "updated dns daemonset" multiple times.


Expected results:

The DNS operator should ignore when the API sets default values and should not log "updated dns daemonset" or "updated dns service" unless the daemonset or service is updated outside of API defaulting.

Comment 1 Miciah Dashiel Butler Masters 2020-06-18 19:20:55 UTC
A PR is posted and awaiting review.  We'll merge it next sprint.

Comment 2 Andrew McDermott 2020-07-09 12:03:36 UTC
I’m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 6 Hongan Li 2020-07-16 06:45:09 UTC
Verified with 4.6.0-0.nightly-2020-07-15-170241 and issue has been fixed.
Follow the reproduce step and just see one log of "updated dns service: openshift-dns/dns-default".

In another 4.5 cluster without the fix, we can see multiple logs as below:
time="2020-07-16T06:36:05Z" level=info msg="updated dns daemonset: openshift-dns/dns-default"
time="2020-07-16T06:36:05Z" level=info msg="updated dns service: openshift-dns/dns-default"
time="2020-07-16T06:36:06Z" level=info msg="updated dns daemonset: openshift-dns/dns-default"
time="2020-07-16T06:36:06Z" level=info msg="updated dns service: openshift-dns/dns-default"

Comment 8 errata-xmlrpc 2020-10-27 16:03:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196