Hide Forgot
Description of problem: When the DNS operator reconciles its operands, the operator gets the DNS service from the API to determine whether the operator needs to create or update the service. If the service already exists, the operator compares it with what the operator expects to get in order to determine whether an update is needed. In this comparison, if the API has set the default value for the service's spec.internalTrafficPolicy field, the operator detects the update and tries to set the field back to the empty value. The operator should not update the service in response to API defaulting. OpenShift release version: Kubernetes 1.22 and OpenShift 4.9 enable the new internalTrafficPolicy field by default. Cluster Platform: Observed on AWS and GCP but can be expected to be the same on all platforms. How reproducible: 100%. Steps to Reproduce (in detail): 1. Launch a new OpenShift 4.9 or 4.10 cluster. 2. Check the DNS operator's logs: oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator 3. Restart the operator: oc -n openshift-dns-operator delete pods -l name=dns-operator 4. Check the DNS operator's logs again. Actual results: The operator logs many "updated dns service openshift-dns/dns-default" messages. Expected results: The operator should log only a few such messages when it first starts, and it shouldn't log any such messages when restarted (unless something else besides the operator itself or API defaulting modifies the service). Impact of the problem: Extra API load and CPU usage performing spurious updates. Additional information: The fix for this BZ should be backported to OpenShift 4.9.
Verified in 4.10.0-0.nightly-2021-09-10-083647 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-09-10-083647 True False 8m46s Cluster version is 4.10.0-0.nightly-2021-09-10-083647 after delete dns-operator pod $ oc -n openshift-dns-operator delete pods -l name=dns-operator pod "dns-operator-598b8b6cc7-vt58d" deleted dns-operator pod was recreated $ oc -n openshift-dns-operator get pod NAME READY STATUS RESTARTS AGE dns-operator-598b8b6cc7-xnvc2 2/2 Running 0 19m #check log again $ oc -n openshift-dns-operator logs -c dns-operator deploy/dns-operator I0913 13:07:22.257263 1 request.go:668] Waited for 1.030837528s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/quota.openshift.io/v1?timeout=32s time="2021-09-13T13:07:23Z" level=info msg="reconciling request: /default" time="2021-09-13T13:07:23Z" level=info msg="reconciling request: /default" donot see "updated dns service" after 19 minutes, issue is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056
This comment was flagged a spam, view the edit history to see the original text if required.