Hide Forgot
+++ This bug was initially created as a clone of Bug #2002461 +++ Description of problem: When the DNS operator reconciles its operands, the operator gets the DNS service from the API to determine whether the operator needs to create or update the service. If the service already exists, the operator compares it with what the operator expects to get in order to determine whether an update is needed. In this comparison, if the API has set the default value for the service's spec.internalTrafficPolicy field, the operator detects the update and tries to set the field back to the empty value. The operator should not update the service in response to API defaulting. OpenShift release version: Kubernetes 1.22 and OpenShift 4.9 enable the new internalTrafficPolicy field by default. Cluster Platform: Observed on AWS and GCP but can be expected to be the same on all platforms. How reproducible: 100%. Steps to Reproduce (in detail): 1. Launch a new OpenShift 4.9 or 4.10 cluster. 2. Check the DNS operator's logs: oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator 3. Restart the operator: oc -n openshift-dns-operator delete pods -l name=dns-operator 4. Check the DNS operator's logs again. Actual results: The operator logs many "updated dns service openshift-dns/dns-default" messages. Expected results: The operator should log only a few such messages when it first starts, and it shouldn't log any such messages when restarted (unless something else besides the operator itself or API defaulting modifies the service). Impact of the problem: Extra API load and CPU usage performing spurious updates. Additional information: The fix for this BZ should be backported to OpenShift 4.9.
Verified in 4.9.0-0.ci.test-2021-09-15-053753-ci-ln-3s3plwb-latest 1.show cluster's version % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.ci.test-2021-09-15-053753-ci-ln-3s3plwb-latest True False 14m Cluster version is 4.9.0-0.ci.test-2021-09-15-053753-ci-ln-3s3plwb-latest 2. delete dns-operator pod % oc -n openshift-dns-operator delete pods -l name=dns-operator pod "dns-operator-59bd9659bd-rmcmj" deleted 3.dns-operator pod was recreated % oc -n openshift-dns-operator get pods NAME READY STATUS RESTARTS AGE dns-operator-59bd9659bd-wpt47 2/2 Running 0 2m10s 4.Wait for sometime and check log again, can't see the "updated dns service openshift-dns" log % oc -n openshift-dns-operator get pods NAME READY STATUS RESTARTS AGE dns-operator-59bd9659bd-wpt47 2/2 Running 0 18m % oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator I0915 06:25:19.561393 1 request.go:668] Waited for 1.029170503s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/quota.openshift.io/v1?timeout=32s time="2021-09-15T06:25:20Z" level=info msg="reconciling request: /default" time="2021-09-15T06:25:21Z" level=info msg="reconciling request: /default"
Setting priority to low as the PR is blocked on 4.9.0 GA.
Verified it with 4.9.0-0.nightly-2021-10-22-102153 and passed 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-10-22-102153 True False 128m Cluster version is 4.9.0-0.nightly-2021-10-22-102153 % 2. % oc -n openshift-dns-operator delete pods -l name=dns-operator pod "dns-operator-5d469849fd-cpsf9" deleted % 3. % oc -n openshift-dns-operator get pods NAME READY STATUS RESTARTS AGE dns-operator-5d469849fd-ztcdj 2/2 Running 0 24s % 4. Wait for sometime and check log, can't see the "updated dns service openshift-dns" log % oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator I1025 05:57:51.853656 1 request.go:668] Waited for 1.028907827s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/template.openshift.io/v1?timeout=32s time="2021-10-25T05:57:53Z" level=info msg="reconciling request: /default" time="2021-10-25T05:57:53Z" level=info msg="reconciling request: /default" %
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.5 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4005