Bug 1955854 - Ingress clusteroperator reports Degraded=True/Available=False if any ingresscontroller is degraded or unavailable
Summary: Ingress clusteroperator reports Degraded=True/Available=False if any ingressc...
Keywords:
Status: POST
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-01 01:15 UTC by Miciah Dashiel Butler Masters
Modified: 2021-05-04 20:57 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 607 0 None open Bug 1955854: Compute Available and Degraded from default ingress 2021-05-01 01:35:33 UTC

Description Miciah Dashiel Butler Masters 2021-05-01 01:15:01 UTC
Description of problem:

The ingress operator reports Available=False in the ingress clusteroperator status conditions if any ingresscontroller is unavailable, and Degraded=True if any ingresscontroller is degraded.  

The operator should report Available=False or Degraded=True only if the *default* ingresscontroller is unavailable or degraded, respectively.  

In addition, the ingress operator should report metrics and have alerting rules to report if ingresscontrollers are unavailable or degraded.  


Version-Release number of selected component (if applicable):

4.8.0.


How reproducible:

100%.


Steps to Reproduce:

1. Launch a new cluster with <20 nodes.

2. On the cluster from Step 1, create an ingresscontroller with 1 replica:

    % oc create -f - <<'EOF'
    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: xyz
      namespace: openshift-ingress-operator
    spec:
      replicas: 1
      domain: xyz.com
      endpointPublishingStrategy:
        type: Private
    EOF

3. Check the ingress clusteroperator:

    % oc get clusteroperators/ingress
    NAME      VERSION                        AVAILABLE   PROGRESSING   DEGRADED   SINCE
    ingress   4.8.0-0.ci-2021-04-30-171538   True        False         False      2m43s

4. Scale the ingresscontroller from Step 2 to 20 replicas:

    % oc -n openshift-ingress-operator scale ingresscontrollers/xyz --replicas=20
    ingresscontroller.operator.openshift.io/xyz scaled

5. Wait about 30 seconds and check the ingress clusteroperator again:

    % oc get clusteroperators/ingress
    NAME      VERSION                        AVAILABLE   PROGRESSING   DEGRADED   SINCE
    ingress   4.8.0-0.ci-2021-04-30-171538   False       True          True       32s


Actual results:

After Step 5, the ingress clusteroperator reports Available=False and Degraded=True.


Expected results:

After Step 5, the ingress clusteroperator should continue to report Available=True and Degraded=False.  However, a metric or alert should indicate that the ingresscontroller from Step 2 is unavailable and degraded.  


Additional info:

This problem was originally noticed because the extended/router/grpc-interop.go and extended/router/http2.go tests in openshift/origin create custom ingresscontrollers, which are briefly degraded and unavailable after their creation, which causes the ingress clusteroperator to report Available=False and Degraded=True.


Note You need to log in before you can comment on or make changes to this bug.