Bug 1942657 - ingress operator stays degraded after privateZone fixed in DNS
Summary: ingress operator stays degraded after privateZone fixed in DNS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.9.0
Assignee: Luigi Mario Zuccarelli
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-24 17:00 UTC by Matthew Staebler
Modified: 2022-08-04 22:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: If the .spec.privateZone field of the dns.config.openshift.io operator is filled out incorrectly so that the ingress operator is not able to find the private hosted zone, then the ingress operator goes degraded. Consequence: However, even after fixing the .spec.privateZone field, the ingress operator stays degraded. The ingress operator finds the hosted zone and adds the *.apps resource record, but the ingress operator does not reset the degraded status Fix: This fix watches the DNSConfig object and monitors changes regarding the .spec.privateZone field. It applies the appropriate logic and updates the operator status accordingly Result: The operator status will return to DEGRADED (False) once the correct .spec.privateZone field is set.
Clone Of:
Environment:
Last Closed: 2021-10-18 17:29:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 641 0 None None None 2021-08-16 15:59:34 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:30:13 UTC

Description Matthew Staebler 2021-03-24 17:00:54 UTC
Description of problem:
If the .spec.privateZone field of the dns.config.openshift.io operator is filled out incorrectly so that the ingress operator is not able to find the private hosted zone, then the ingress operator goes degraded. That is good. However, even after fixing the .spec.privateZone field, the ingress operator stays degraded. The ingress operator finds the hosted zone and adds the *.apps resource record, but the ingress operator does not reset the degraded status.


How reproducible:
I only attempted once, so I cannot say whether it is reproducible.


Steps to Reproduce:
1. openshift-install create manifests
2. Change the .spec.privateZone fields in the manifests/cluster-dns-02-config.yml file.
3. openshit-install create cluster
4. Wait for the cluster to install enough for the ingress operator to report degraded due to "FailedZones: The record failed to provision in some zones: [...]"
5. Fix the .spec.privateZone fields via `oc edit dns.config.openshift.io cluster`.
6. Wait for the ingress operator to create the *.apps record in the hosted zone.
7. Observe that the ingress operator is still degraded with the same "FailedZones" message.

Actual results:
ingress operator remains degraded

Expected results:
ingress operator should reset its degraded status and show not degraded

Comment 1 Stephen Greene 2021-04-23 17:22:04 UTC
(In reply to Matthew Staebler from comment #0)
> How reproducible:
> I only attempted once, so I cannot say whether it is reproducible.

This is reproducible post-installation by modifying .spec.privateZone in the cluster DNS config to an invalid zone value, and then reverting the change.
The DNS controller does not remove the status condition for the zone that we no longer care about.

Workaround would be to delete the DNSRecord resource and let the DNS operator re-create it (which is inconvenient).

Comment 4 Hongan Li 2021-08-18 04:01:44 UTC
verified with 4.9.0-0.nightly-2021-08-17-122812 and passed.

steps:
1. oc adm release extract --command=openshift-install registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-08-17-122812 -a ../pull-secret-full.txt
2. ./openshift-install create manifests --dir ./818/
3. change the .spec.privateZone fields in the manifests/cluster-dns-02-config.yml file.
4. ./openshit-install create cluster --dir ./818/
5. ingress reports below error during the installation 

The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DNSReady=False (FailedZones: The record failed to provision in some zones: [{ map[Name:hongli-xx-j7tk5-int kubernetes.io/cluster/hongli-bz-j7tk5:owned]}])


6. oc edit dnses.config.openshift.io cluster
<---sni--->
  privateZone:
    tags:
      Name: hongli-bz-j7tk5-int                           <---- update from hongli-xx-j7tk5-int to correct one
      kubernetes.io/cluster/hongli-bz-j7tk5: owned

7. wait for a while and find ingress is available
$ oc get co/ingress
NAME      VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.9.0-0.nightly-2021-08-17-122812   True        False         False      3m52s   

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-17-122812   True        False         11s     Cluster version is 4.9.0-0.nightly-2021-08-17-122812


install log:
$ ./openshift-install create cluster --dir ./818/
INFO Consuming Worker Machines from target directory 
INFO Consuming Common Manifests from target directory 
INFO Consuming OpenShift Install (Manifests) from target directory 
INFO Consuming Master Machines from target directory 
INFO Consuming Openshift Manifests from target directory 
INFO Credentials loaded from the "default" profile in file "/home/hongan/.aws/credentials" 
INFO Creating infrastructure resources...         
INFO Waiting up to 20m0s for the Kubernetes API at https://api.hongli-bz.qe.devcluster.openshift.com:6443... 
INFO API v1.22.0-rc.0+3dfed96 up                  
INFO Waiting up to 30m0s for bootstrapping to complete... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 40m0s for the cluster at https://api.hongli-bz.qe.devcluster.openshift.com:6443 to initialize... 
INFO Waiting up to 10m0s for the openshift-console route to be created... 
INFO Install complete!

Comment 8 errata-xmlrpc 2021-10-18 17:29:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.