Description of problem: The Ingress Operator is responsible for provisioning a wildcard dns record for the ingress domain when using an LB endpoint publishing type. If it fails to publish the record, the operator reports Available=True and Degraded=True. An IngressController Available condition type is calculated solely on the availability of the operand deployment. Should Available=False if the operator fails to provision the dns record for an IngressController? Version-Release number of selected component (if applicable): 4.6 How reproducible: Always Steps to Reproduce: 1. Create a cluster 2. Run ingress operator locally (first implement a fault that will cause the operator to fail provisioning the dns record). 3. Create an ingresscontroller Actual results: Available=True ingress operator status condition. Expected results: Available=False ingress operator status condition. Additional info: This issue was observed in https://github.com/openshift/cluster-ingress-operator/pull/433
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.
In computeIngressAvailableCondition we could do more than just check for DeploymentAvailable. We can check for DNSReady and LoadBalancerReady to balance the checks made in computeIngressDegradedCondition. https://github.com/openshift/cluster-ingress-operator/pull/495 PR comments: Check DNSReady and LoadBalancerReady when determining Availability in computeIngressAvailableCondition Call computeIngressAvailableCondition after computing other conditions in syncIngressControllerStatus Change unit tests for computeIngressAvailableCondition Refactor the checks for conditions into checkAnnotatedConditions, to share between computeIngressAvailable and computeIngressDegraded
verified with 4.7.0-0.nightly-2020-12-14-03511 and passed. create an ingresscontroller with the domain: internal.test.example.com (failed to publish the record) # oc -n openshift-ingress-operator get ingresscontroller test -oyaml <---snip----> - lastTransitionTime: "2020-12-14T07:47:53Z" message: 'The record failed to provision in some zones: [xxxx]' reason: FailedZones status: "False" type: DNSReady - lastTransitionTime: "2020-12-14T07:48:13Z" message: 'One or more status conditions indicate unavailable: DNSReady=False (FailedZones: The record failed to provision in some zones: [xxxx])' reason: IngressControllerUnavailable status: "False" type: Available - lastTransitionTime: "2020-12-14T07:48:23Z" message: 'One or more other status conditions indicate a degraded state: DNSReady=False (FailedZones: The record failed to provision in some zones: [xxxx])' reason: DegradedConditions status: "True" type: Degraded domain: internal.test.example.com
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633