Description of problem: When the ingress operator's DNS controller reconciles a DNSRecord and computes the new status of the DNSRecord, it performs an equality check on the old and new status conditions to determine whether or not they have changed, and thus whether or not the controller should update the DNSRecord's status. This equality check returns false positives for status conditions that are already set but have changed status (for example, from Failed=False to Failed=True). This causes the controller to fail to record success after an earlier failure, and as a result, the DNS controller endlessly retries publishing the DNSRecord, and the IngressController's status conditions show DNSReady=False and Degraded=True. Version-Release number of selected component (if applicable): 4.6.0-0.ci.test-2020-08-29-131655-ci-op-rv6xn61c How reproducible: Happens when the DNS provider returns an error and subsequently returns success. Observed in some CI runs. For example, <https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_router/170/pull-ci-openshift-router-master-e2e/1299697433579622400>. Actual results: Here, the DNS controller initially fails to publish the DNS records: 2020-08-29T13:36:49.275Z ERROR operator.dns_controller dns/controller.go:181 failed to publish DNS record to zone {"record": {"dnsName":"*.apps.ci-op-rv6xn61c-8f0fe.origin-ci-int-gce.dev.openshift.com.","targets":["34.73.141.20"],"recordType":"A","recordTTL":30}, "dnszone": {"id":"ci-op-rv6xn61c-8f0fe-wm6nv-private-zone"}, "error": "Post https://dns.googleapis.com/dns/v1/projects/openshift-gce-devel-ci/managedZones/ci-op-rv6xn61c-8f0fe-wm6nv-private-zone/changes?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: i/o timeout"} ... 2020-08-29T13:37:09.283Z ERROR operator.dns_controller dns/controller.go:181 failed to publish DNS record to zone {"record": {"dnsName":"*.apps.ci-op-rv6xn61c-8f0fe.origin-ci-int-gce.dev.openshift.com.","targets":["34.73.141.20"],"recordType":"A","recordTTL":30}, "dnszone": {"id":"origin-ci-int-gce-new"}, "error": "Post https://dns.googleapis.com/dns/v1/projects/openshift-gce-devel-ci/managedZones/origin-ci-int-gce-new/changes?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com on 172.30.0.10:53: read udp 10.130.0.22:41182->172.30.0.10:53: read: connection refused"} Shortly after, the DNS controller succeeds in publishing the records: 2020-08-29T13:37:09.295Z INFO operator.dns_controller controller/controller.go:233 updated dnsrecord {"dnsrecord": {"metadata":{"name":"default-wildcard","namespace":"openshift-ingress-operator","selfLink":"/apis/ingress.operator.openshift.io/v1/namespaces/openshift-ingress-operator/dnsrecords/default-wildcard/status","uid":"9448c7dc-e082-47fc-8f45-062e0ea092b6","resourceVersion":"17508","generation":1,"creationTimestamp":"2020-08-29T13:36:19Z","labels":{"ingresscontroller.operator.openshift.io/owning-ingresscontroller":"default"},"ownerReferences":[{"apiVersion":"operator.openshift.io/v1","kind":"IngressController","name":"default","uid":"6300ae77-9e39-4432-81d2-d8c64dc2340f","controller":true,"blockOwnerDeletion":true}],"finalizers":["operator.openshift.io/ingress-dns"],"managedFields":[{"manager":"ingress-operator","operation":"Update","apiVersion":"ingress.operator.openshift.io/v1","time":"2020-08-29T13:37:09Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"operator.openshift.io/ingress-dns\"":{}},"f:labels":{".":{},"f:ingresscontroller.operator.openshift.io/owning-ingresscontroller":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"6300ae77-9e39-4432-81d2-d8c64dc2340f\"}":{".":{},"f:apiVersion":{},"f:blockOwnerDeletion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}},"f:spec":{".":{},"f:dnsName":{},"f:recordTTL":{},"f:recordType":{},"f:targets":{}},"f:status":{".":{},"f:observedGeneration":{},"f:zones":{}}}}]},"spec":{"dnsName":"*.apps.ci-op-rv6xn61c-8f0fe.origin-ci-int-gce.dev.openshift.com.","targets":["34.73.141.20"],"recordType":"A","recordTTL":30},"status":{"zones":[{"dnsZone":{"id":"ci-op-rv6xn61c-8f0fe-wm6nv-private-zone"},"conditions":[{"type":"Failed","status":"True","lastTransitionTime":"2020-08-29T13:36:19Z","reason":"ProviderError","message":"The DNS provider failed to ensure the record: Post https://dns.googleapis.com/dns/v1/projects/openshift-gce-devel-ci/managedZones/ci-op-rv6xn61c-8f0fe-wm6nv-private-zone/changes?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: i/o timeout"}]},{"dnsZone":{"id":"origin-ci-int-gce-new"},"conditions":[{"type":"Failed","status":"True","lastTransitionTime":"2020-08-29T13:36:49Z","reason":"ProviderError","message":"The DNS provider failed to ensure the record: Post https://dns.googleapis.com/dns/v1/projects/openshift-gce-devel-ci/managedZones/origin-ci-int-gce-new/changes?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com on 172.30.0.10:53: read udp 10.130.0.22:41182->172.30.0.10:53: read: connection refused"}]}],"observedGeneration":1}}} However, the operator does not update the DNSRecord, and soon the ingress controller reports DNSReady=False: 2020-08-29T13:37:09.330Z ERROR operator.ingress_controller controller/controller.go:233 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: DeploymentAvailable=False, DeploymentReplicasMinAvailable=False, DNSReady=False"} The DNS controller continues retrying publishing the records: 2020-08-29T13:37:19.651Z INFO operator.dns_controller dns/controller.go:181 published DNS record to zone {"record": {"dnsName":"*.apps.ci-op-rv6xn61c-8f0fe.origin-ci-int-gce.dev.openshift.com.","targets":["34.73.141.20"],"recordType":"A","recordTTL":30}, "dnszone": {"id":"ci-op-rv6xn61c-8f0fe-wm6nv-private-zone"}} 2020-08-29T13:37:20.137Z INFO operator.dns_controller dns/controller.go:181 published DNS record to zone {"record": {"dnsName":"*.apps.ci-op-rv6xn61c-8f0fe.origin-ci-int-gce.dev.openshift.com.","targets":["34.73.141.20"],"recordType":"A","recordTTL":30}, "dnszone": {"id":"origin-ci-int-gce-new"}} The IngressController remains degraded because the DNSRecord is never updated from Failed=True to Failed=False: 2020-08-29T13:41:19.061Z ERROR operator.ingress_controller controller/controller.go:233 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: DNSReady=False"} Expected results: The DNS controller should log "updated dnsrecord" with the "Failed" status conditions set to "False" after it succeeds in publishing the records. Additional info: The logic error was introduced in 4.5.0 with https://github.com/openshift/cluster-ingress-operator/pull/390/commits/d953fa97c7f90d8ec733fdbf9ba12aa5fb433cc1.
*** Bug 1874051 has been marked as a duplicate of this bug. ***
verified with 4.6.0-0.nightly-2020-09-05-015624 and issue has been fixed. test steps: 1. run "oc -n openshift-ingress-operator edit secret cloud-credentials" and change the key 2. delete dnsrecords and wait for it is recreated 3. ensure DNSRecord status is Failed=True and co/ingress is Degraded 4. delete secret cloud-credentials and wait for it is recreated 5. ensure DNSRecord status is Failed=False and co/ingress is not Degraded
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196