1870373 – Ingress Operator reports available when DNS fails to provision

Bug 1870373 - Ingress Operator reports available when DNS fails to provision

Summary: Ingress Operator reports available when DNS fails to provision

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Candace Holman
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-19 21:47 UTC by Daneyon Hansen
Modified:	2022-08-04 22:30 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: several conditions contribute to ingress availability. Only the deployment availability condition was being checked when calculating ingress availability. Consequence: the ingress operator reports available even when DNS is not yet provisioned or a required load balancer is not ready. Fix: check that DNS is provisioned and required load balancer is ready before reporting availability. Result: ingress operator reports available=false when DNS is not provisioned or a required load balancer is not ready.
Clone Of:
Environment:
Last Closed:	2021-02-24 15:16:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 495	0	None	closed	Bug 1870373: Ingress Operator reports available when DNS fails to provision	2021-02-02 21:10:51 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:16:33 UTC

Description Daneyon Hansen 2020-08-19 21:47:01 UTC

Description of problem:
The Ingress Operator is responsible for provisioning a wildcard dns record for the ingress domain when using an LB endpoint publishing type. If it fails to publish the record, the operator reports Available=True and Degraded=True. An IngressController Available condition type is calculated solely on the availability of the operand deployment. Should Available=False if the operator fails to provision the dns record for an IngressController?

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster
2. Run ingress operator locally (first implement a fault that will cause the operator to fail provisioning the dns record).
3. Create an ingresscontroller

Actual results:
Available=True ingress operator status condition.

Expected results:
Available=False ingress operator status condition.


Additional info:
This issue was observed in https://github.com/openshift/cluster-ingress-operator/pull/433

Comment 1 Daneyon Hansen 2020-09-09 15:55:59 UTC

I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 2 Daneyon Hansen 2020-10-01 16:24:54 UTC

I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 3 Daneyon Hansen 2020-10-23 15:53:49 UTC

I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 4 Daneyon Hansen 2020-11-12 16:54:05 UTC

I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 5 Candace Holman 2020-11-23 22:10:58 UTC

In computeIngressAvailableCondition we could do more than just check for DeploymentAvailable.  We can check for DNSReady and LoadBalancerReady to balance the checks made in computeIngressDegradedCondition. 

https://github.com/openshift/cluster-ingress-operator/pull/495
PR comments:
Check DNSReady and LoadBalancerReady when determining Availability in computeIngressAvailableCondition
Call computeIngressAvailableCondition after computing other conditions in syncIngressControllerStatus
Change unit tests for computeIngressAvailableCondition
Refactor the checks for conditions into checkAnnotatedConditions, to share between computeIngressAvailable and computeIngressDegraded

Comment 7 Hongan Li 2020-12-14 08:41:21 UTC

verified with 4.7.0-0.nightly-2020-12-14-03511 and passed.

create an ingresscontroller with the domain: internal.test.example.com (failed to publish the record)

# oc -n openshift-ingress-operator get ingresscontroller test -oyaml
<---snip---->
  - lastTransitionTime: "2020-12-14T07:47:53Z"
    message: 'The record failed to provision in some zones: [xxxx]'
    reason: FailedZones
    status: "False"
    type: DNSReady
  - lastTransitionTime: "2020-12-14T07:48:13Z"
    message: 'One or more status conditions indicate unavailable: DNSReady=False (FailedZones: The record failed to provision in some zones: [xxxx])'
    reason: IngressControllerUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2020-12-14T07:48:23Z"
    message: 'One or more other status conditions indicate a degraded state: DNSReady=False (FailedZones: The record failed to provision in some zones: [xxxx])'
    reason: DegradedConditions
    status: "True"
    type: Degraded
  domain: internal.test.example.com

Comment 10 errata-xmlrpc 2021-02-24 15:16:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.