Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2002621

Summary:	DNS operator performs spurious updates in response to API's defaulting of service's internalTrafficPolicy
Product:	OpenShift Container Platform	Reporter:	OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component:	Networking	Assignee:	Miciah Dashiel Butler Masters <mmasters>
Networking sub component:	DNS	QA Contact:	Shudi Li <shudili>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	low	CC:	aos-bugs, hongli
Version:	4.9
Target Milestone:	---
Target Release:	4.9.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: When the DNS operator reconciles its operands, the operator gets the cluster DNS service object from the API to determine whether the operator needs to create or update the service. If the service already exists, the operator compares it with what the operator expects to get in order to determine whether an update is needed. Kubernetes 1.22, on which OpenShift 4.9 is based, introduced a new spec.internalTrafficPolicy API field for services. The operator leaves this field empty when it creates the service, but the API sets a default value for this field. The operator was observing this default value and trying to update the field back to the empty value. Consequence: The operator's update logic would keep trying to revert the default value that the API set for the service's internal traffic policy. Fix: When comparing services to determine whether an update is required, the operator now treats the empty value and default value for spec.internalTrafficPolicy as equal. Result: The operator no longer spuriously tries to update the cluster DNS service when the API sets a default value for the service's spec.internalTrafficPolicy field.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-11-01 13:44:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2002461
Bug Blocks:

Description OpenShift BugZilla Robot 2021-09-09 11:32:00 UTC

+++ This bug was initially created as a clone of Bug #2002461 +++

Description of problem:

When the DNS operator reconciles its operands, the operator gets the DNS service from the API to determine whether the operator needs to create or update the service.  If the service already exists, the operator compares it with what the operator expects to get in order to determine whether an update is needed.  In this comparison, if the API has set the default value for the service's spec.internalTrafficPolicy field, the operator detects the update and tries to set the field back to the empty value.  The operator should not update the service in response to API defaulting.


OpenShift release version:

Kubernetes 1.22 and OpenShift 4.9 enable the new internalTrafficPolicy field by default.  


Cluster Platform:

Observed on AWS and GCP but can be expected to be the same on all platforms.


How reproducible:

100%.


Steps to Reproduce (in detail):

1. Launch a new OpenShift 4.9 or 4.10 cluster.

2. Check the DNS operator's logs: 

    oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator

3. Restart the operator: 

    oc -n openshift-dns-operator delete pods -l name=dns-operator

4. Check the DNS operator's logs again.  

Actual results:

The operator logs many "updated dns service openshift-dns/dns-default" messages.  


Expected results:

The operator should log only a few such messages when it first starts, and it shouldn't log any such messages when restarted (unless something else besides the operator itself or API defaulting modifies the service).  


Impact of the problem:

Extra API load and CPU usage performing spurious updates.


Additional information:

The fix for this BZ should be backported to OpenShift 4.9.

Comment 1 Shudi Li 2021-09-15 07:13:13 UTC

Verified in 4.9.0-0.ci.test-2021-09-15-053753-ci-ln-3s3plwb-latest

1.show cluster's version
% oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2021-09-15-053753-ci-ln-3s3plwb-latest   True        False         14m     Cluster version is 4.9.0-0.ci.test-2021-09-15-053753-ci-ln-3s3plwb-latest

2. delete dns-operator pod
% oc -n openshift-dns-operator delete pods -l name=dns-operator
pod "dns-operator-59bd9659bd-rmcmj" deleted

3.dns-operator pod was recreated
% oc -n openshift-dns-operator get pods
NAME                            READY   STATUS    RESTARTS   AGE
dns-operator-59bd9659bd-wpt47   2/2     Running   0          2m10s

4.Wait for sometime and check log again, can't see the "updated dns service openshift-dns" log
% oc -n openshift-dns-operator get pods
NAME                            READY   STATUS    RESTARTS   AGE
dns-operator-59bd9659bd-wpt47   2/2     Running   0          18m
 
% oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator                     
I0915 06:25:19.561393       1 request.go:668] Waited for 1.029170503s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/quota.openshift.io/v1?timeout=32s
time="2021-09-15T06:25:20Z" level=info msg="reconciling request: /default"
time="2021-09-15T06:25:21Z" level=info msg="reconciling request: /default"

Comment 2 Miciah Dashiel Butler Masters 2021-09-23 16:20:31 UTC

Setting priority to low as the PR is blocked on 4.9.0 GA.

Comment 5 Shudi Li 2021-10-25 06:03:47 UTC

Verified it with 4.9.0-0.nightly-2021-10-22-102153 and passed

1.
% oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-10-22-102153   True        False         128m    Cluster version is 4.9.0-0.nightly-2021-10-22-102153
%

2.
% oc -n openshift-dns-operator delete pods -l name=dns-operator
pod "dns-operator-5d469849fd-cpsf9" deleted
%

3.
% oc -n openshift-dns-operator get pods                        
NAME                            READY   STATUS    RESTARTS   AGE
dns-operator-5d469849fd-ztcdj   2/2     Running   0          24s
%

4. Wait for sometime and check log, can't see the "updated dns service openshift-dns" log
% oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator
I1025 05:57:51.853656       1 request.go:668] Waited for 1.028907827s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/template.openshift.io/v1?timeout=32s
time="2021-10-25T05:57:53Z" level=info msg="reconciling request: /default"
time="2021-10-25T05:57:53Z" level=info msg="reconciling request: /default"
%

Comment 8 errata-xmlrpc 2021-11-01 13:44:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4005