Description of problem: If the .spec.privateZone field of the dns.config.openshift.io operator is filled out incorrectly so that the ingress operator is not able to find the private hosted zone, then the ingress operator goes degraded. That is good. However, even after fixing the .spec.privateZone field, the ingress operator stays degraded. The ingress operator finds the hosted zone and adds the *.apps resource record, but the ingress operator does not reset the degraded status. How reproducible: I only attempted once, so I cannot say whether it is reproducible. Steps to Reproduce: 1. openshift-install create manifests 2. Change the .spec.privateZone fields in the manifests/cluster-dns-02-config.yml file. 3. openshit-install create cluster 4. Wait for the cluster to install enough for the ingress operator to report degraded due to "FailedZones: The record failed to provision in some zones: [...]" 5. Fix the .spec.privateZone fields via `oc edit dns.config.openshift.io cluster`. 6. Wait for the ingress operator to create the *.apps record in the hosted zone. 7. Observe that the ingress operator is still degraded with the same "FailedZones" message. Actual results: ingress operator remains degraded Expected results: ingress operator should reset its degraded status and show not degraded
(In reply to Matthew Staebler from comment #0) > How reproducible: > I only attempted once, so I cannot say whether it is reproducible. This is reproducible post-installation by modifying .spec.privateZone in the cluster DNS config to an invalid zone value, and then reverting the change. The DNS controller does not remove the status condition for the zone that we no longer care about. Workaround would be to delete the DNSRecord resource and let the DNS operator re-create it (which is inconvenient).
verified with 4.9.0-0.nightly-2021-08-17-122812 and passed. steps: 1. oc adm release extract --command=openshift-install registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-08-17-122812 -a ../pull-secret-full.txt 2. ./openshift-install create manifests --dir ./818/ 3. change the .spec.privateZone fields in the manifests/cluster-dns-02-config.yml file. 4. ./openshit-install create cluster --dir ./818/ 5. ingress reports below error during the installation The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DNSReady=False (FailedZones: The record failed to provision in some zones: [{ map[Name:hongli-xx-j7tk5-int kubernetes.io/cluster/hongli-bz-j7tk5:owned]}]) 6. oc edit dnses.config.openshift.io cluster <---sni---> privateZone: tags: Name: hongli-bz-j7tk5-int <---- update from hongli-xx-j7tk5-int to correct one kubernetes.io/cluster/hongli-bz-j7tk5: owned 7. wait for a while and find ingress is available $ oc get co/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress 4.9.0-0.nightly-2021-08-17-122812 True False False 3m52s $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-08-17-122812 True False 11s Cluster version is 4.9.0-0.nightly-2021-08-17-122812 install log: $ ./openshift-install create cluster --dir ./818/ INFO Consuming Worker Machines from target directory INFO Consuming Common Manifests from target directory INFO Consuming OpenShift Install (Manifests) from target directory INFO Consuming Master Machines from target directory INFO Consuming Openshift Manifests from target directory INFO Credentials loaded from the "default" profile in file "/home/hongan/.aws/credentials" INFO Creating infrastructure resources... INFO Waiting up to 20m0s for the Kubernetes API at https://api.hongli-bz.qe.devcluster.openshift.com:6443... INFO API v1.22.0-rc.0+3dfed96 up INFO Waiting up to 30m0s for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s for the cluster at https://api.hongli-bz.qe.devcluster.openshift.com:6443 to initialize... INFO Waiting up to 10m0s for the openshift-console route to be created... INFO Install complete!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759