1942657 – ingress operator stays degraded after privateZone fixed in DNS

Bug 1942657 - ingress operator stays degraded after privateZone fixed in DNS

Summary: ingress operator stays degraded after privateZone fixed in DNS

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Luigi Mario Zuccarelli
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-24 17:00 UTC by Matthew Staebler
Modified:	2022-08-04 22:32 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: If the .spec.privateZone field of the dns.config.openshift.io operator is filled out incorrectly so that the ingress operator is not able to find the private hosted zone, then the ingress operator goes degraded. Consequence: However, even after fixing the .spec.privateZone field, the ingress operator stays degraded. The ingress operator finds the hosted zone and adds the *.apps resource record, but the ingress operator does not reset the degraded status Fix: This fix watches the DNSConfig object and monitors changes regarding the .spec.privateZone field. It applies the appropriate logic and updates the operator status accordingly Result: The operator status will return to DEGRADED (False) once the correct .spec.privateZone field is set.
Clone Of:
Environment:
Last Closed:	2021-10-18 17:29:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 641	0	None	None	None	2021-08-16 15:59:34 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:30:13 UTC

Description Matthew Staebler 2021-03-24 17:00:54 UTC

Description of problem:
If the .spec.privateZone field of the dns.config.openshift.io operator is filled out incorrectly so that the ingress operator is not able to find the private hosted zone, then the ingress operator goes degraded. That is good. However, even after fixing the .spec.privateZone field, the ingress operator stays degraded. The ingress operator finds the hosted zone and adds the *.apps resource record, but the ingress operator does not reset the degraded status.


How reproducible:
I only attempted once, so I cannot say whether it is reproducible.


Steps to Reproduce:
1. openshift-install create manifests
2. Change the .spec.privateZone fields in the manifests/cluster-dns-02-config.yml file.
3. openshit-install create cluster
4. Wait for the cluster to install enough for the ingress operator to report degraded due to "FailedZones: The record failed to provision in some zones: [...]"
5. Fix the .spec.privateZone fields via `oc edit dns.config.openshift.io cluster`.
6. Wait for the ingress operator to create the *.apps record in the hosted zone.
7. Observe that the ingress operator is still degraded with the same "FailedZones" message.

Actual results:
ingress operator remains degraded

Expected results:
ingress operator should reset its degraded status and show not degraded

Comment 1 Stephen Greene 2021-04-23 17:22:04 UTC

(In reply to Matthew Staebler from comment #0)
> How reproducible:
> I only attempted once, so I cannot say whether it is reproducible.

This is reproducible post-installation by modifying .spec.privateZone in the cluster DNS config to an invalid zone value, and then reverting the change.
The DNS controller does not remove the status condition for the zone that we no longer care about.

Workaround would be to delete the DNSRecord resource and let the DNS operator re-create it (which is inconvenient).

Comment 4 Hongan Li 2021-08-18 04:01:44 UTC

verified with 4.9.0-0.nightly-2021-08-17-122812 and passed.

steps:
1. oc adm release extract --command=openshift-install registry.ci.openshift.org/ocp/release:4.9.0-0.nightly-2021-08-17-122812 -a ../pull-secret-full.txt
2. ./openshift-install create manifests --dir ./818/
3. change the .spec.privateZone fields in the manifests/cluster-dns-02-config.yml file.
4. ./openshit-install create cluster --dir ./818/
5. ingress reports below error during the installation

The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DNSReady=False (FailedZones: The record failed to provision in some zones: [{ map[Name:hongli-xx-j7tk5-int kubernetes.io/cluster/hongli-bz-j7tk5:owned]}])

6. oc edit dnses.config.openshift.io cluster
<---sni--->
privateZone:
tags:
Name: hongli-bz-j7tk5-int <---- update from hongli-xx-j7tk5-int to correct one
kubernetes.io/cluster/hongli-bz-j7tk5: owned

7. wait for a while and find ingress is available
$ oc get co/ingress
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
ingress 4.9.0-0.nightly-2021-08-17-122812 True False False 3m52s

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-08-17-122812 True False 11s Cluster version is 4.9.0-0.nightly-2021-08-17-122812

install log:
$ ./openshift-install create cluster --dir ./818/
INFO Consuming Worker Machines from target directory
INFO Consuming Common Manifests from target directory
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Consuming Master Machines from target directory
INFO Consuming Openshift Manifests from target directory
INFO Credentials loaded from the "default" profile in file "/home/hongan/.aws/credentials"
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s for the Kubernetes API at https://api.hongli-bz.qe.devcluster.openshift.com:6443...
INFO API v1.22.0-rc.0+3dfed96 up
INFO Waiting up to 30m0s for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 40m0s for the cluster at https://api.hongli-bz.qe.devcluster.openshift.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!

Comment 8 errata-xmlrpc 2021-10-18 17:29:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.