Bug 1870373

Summary: Ingress Operator reports available when DNS fails to provision
Product: OpenShift Container Platform Reporter: Daneyon Hansen <dhansen>
Component: NetworkingAssignee: Candace Holman <cholman>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, mmasters
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: several conditions contribute to ingress availability. Only the deployment availability condition was being checked when calculating ingress availability. Consequence: the ingress operator reports available even when DNS is not yet provisioned or a required load balancer is not ready. Fix: check that DNS is provisioned and required load balancer is ready before reporting availability. Result: ingress operator reports available=false when DNS is not provisioned or a required load balancer is not ready.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:16:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daneyon Hansen 2020-08-19 21:47:01 UTC
Description of problem:
The Ingress Operator is responsible for provisioning a wildcard dns record for the ingress domain when using an LB endpoint publishing type. If it fails to publish the record, the operator reports Available=True and Degraded=True. An IngressController Available condition type is calculated solely on the availability of the operand deployment. Should Available=False if the operator fails to provision the dns record for an IngressController?

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster
2. Run ingress operator locally (first implement a fault that will cause the operator to fail provisioning the dns record).
3. Create an ingresscontroller

Actual results:
Available=True ingress operator status condition.

Expected results:
Available=False ingress operator status condition.


Additional info:
This issue was observed in https://github.com/openshift/cluster-ingress-operator/pull/433

Comment 1 Daneyon Hansen 2020-09-09 15:55:59 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 2 Daneyon Hansen 2020-10-01 16:24:54 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 3 Daneyon Hansen 2020-10-23 15:53:49 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 4 Daneyon Hansen 2020-11-12 16:54:05 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 5 Candace Holman 2020-11-23 22:10:58 UTC
In computeIngressAvailableCondition we could do more than just check for DeploymentAvailable.  We can check for DNSReady and LoadBalancerReady to balance the checks made in computeIngressDegradedCondition. 

https://github.com/openshift/cluster-ingress-operator/pull/495
PR comments:
Check DNSReady and LoadBalancerReady when determining Availability in computeIngressAvailableCondition
Call computeIngressAvailableCondition after computing other conditions in syncIngressControllerStatus
Change unit tests for computeIngressAvailableCondition
Refactor the checks for conditions into checkAnnotatedConditions, to share between computeIngressAvailable and computeIngressDegraded

Comment 7 Hongan Li 2020-12-14 08:41:21 UTC
verified with 4.7.0-0.nightly-2020-12-14-03511 and passed.

create an ingresscontroller with the domain: internal.test.example.com (failed to publish the record)

# oc -n openshift-ingress-operator get ingresscontroller test -oyaml
<---snip---->
  - lastTransitionTime: "2020-12-14T07:47:53Z"
    message: 'The record failed to provision in some zones: [xxxx]'
    reason: FailedZones
    status: "False"
    type: DNSReady
  - lastTransitionTime: "2020-12-14T07:48:13Z"
    message: 'One or more status conditions indicate unavailable: DNSReady=False (FailedZones: The record failed to provision in some zones: [xxxx])'
    reason: IngressControllerUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2020-12-14T07:48:23Z"
    message: 'One or more other status conditions indicate a degraded state: DNSReady=False (FailedZones: The record failed to provision in some zones: [xxxx])'
    reason: DegradedConditions
    status: "True"
    type: Degraded
  domain: internal.test.example.com

Comment 10 errata-xmlrpc 2021-02-24 15:16:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633