Description of problem: The ingress operator reports Available=False in the ingress clusteroperator status conditions if any ingresscontroller is unavailable, and Degraded=True if any ingresscontroller is degraded. The operator should report Available=False or Degraded=True only if the *default* ingresscontroller is unavailable or degraded, respectively. In addition, the ingress operator should report metrics and have alerting rules to report if ingresscontrollers are unavailable or degraded. Version-Release number of selected component (if applicable): 4.8.0. How reproducible: 100%. Steps to Reproduce: 1. Launch a new cluster with <20 nodes. 2. On the cluster from Step 1, create an ingresscontroller with 1 replica: % oc create -f - <<'EOF' apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: xyz namespace: openshift-ingress-operator spec: replicas: 1 domain: xyz.com endpointPublishingStrategy: type: Private EOF 3. Check the ingress clusteroperator: % oc get clusteroperators/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ingress 4.8.0-0.ci-2021-04-30-171538 True False False 2m43s 4. Scale the ingresscontroller from Step 2 to 20 replicas: % oc -n openshift-ingress-operator scale ingresscontrollers/xyz --replicas=20 ingresscontroller.operator.openshift.io/xyz scaled 5. Wait about 30 seconds and check the ingress clusteroperator again: % oc get clusteroperators/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ingress 4.8.0-0.ci-2021-04-30-171538 False True True 32s Actual results: After Step 5, the ingress clusteroperator reports Available=False and Degraded=True. Expected results: After Step 5, the ingress clusteroperator should continue to report Available=True and Degraded=False. However, a metric or alert should indicate that the ingresscontroller from Step 2 is unavailable and degraded. Additional info: This problem was originally noticed because the extended/router/grpc-interop.go and extended/router/http2.go tests in openshift/origin create custom ingresscontrollers, which are briefly degraded and unavailable after their creation, which causes the ingress clusteroperator to report Available=False and Degraded=True.
verified in 4.8.0-0.nightly-2021-05-12-122225 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-12-122225 True False 23m Cluster version is 4.8.0-0.nightly-2021-05-12-122225 1. create an ingresscontroller named xyz $ cat ingresscontroll-BZ1955854 apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: xyz namespace: openshift-ingress-operator spec: replicas: 1 domain: xyz.com endpointPublishingStrategy: type: Private $ oc create -f ingresscontroll-BZ1955854 2. $ oc get clusteroperators/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ingress 4.8.0-0.nightly-2021-05-12-122225 True True False 36m 3. scale up the xyz ingresscontroller $ oc -n openshift-ingress-operator scale ingresscontrollers/xyz --replicas=20 ingresscontroller.operator.openshift.io/xyz scaled 4. waited more than 15 minutes, 3 out of 20 xyz deployment became ready, the rest are still pending $ oc -n openshift-ingress get all NAME READY STATUS RESTARTS AGE pod/router-default-5fd58fd757-cbf8k 1/1 Running 0 47m pod/router-default-5fd58fd757-qmqg5 1/1 Running 0 47m pod/router-xyz-6bb7549fc9-44xqw 1/1 Running 0 21m pod/router-xyz-6bb7549fc9-5sblt 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-89mvk 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-8k9gb 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-9thwx 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-9z5pn 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-b66dr 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-bwk6z 1/1 Running 0 21m pod/router-xyz-6bb7549fc9-c5w4n 1/1 Running 0 23m pod/router-xyz-6bb7549fc9-cctx4 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-gfxkv 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-gp4cd 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-gwfc7 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-hlmls 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-nf7h6 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-nf99g 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-p8g52 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-tmzhm 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-vcw4q 0/1 Pending 0 21m pod/router-xyz-6bb7549fc9-z4zxx 0/1 Pending 0 21m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/router-default LoadBalancer 172.30.192.133 35.231.8.178 80:32358/TCP,443:30227/TCP 47m service/router-internal-default ClusterIP 172.30.144.6 <none> 80/TCP,443/TCP,1936/TCP 47m service/router-internal-xyz ClusterIP 172.30.158.15 <none> 80/TCP,443/TCP,1936/TCP 23m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/router-default 2/2 2 2 47m deployment.apps/router-xyz 3/20 20 3 23m NAME DESIRED CURRENT READY AGE replicaset.apps/router-default-5fd58fd757 2 2 2 47m replicaset.apps/router-xyz-6bb7549fc9 20 20 3 23m 5. check the ingress clusteroperator again, its AVAILABLE is still True and DEGRADED is still False $ oc get clusteroperators/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE ingress 4.8.0-0.nightly-2021-05-12-122225 True True False 46m 6
Unavailable and Degraded alerts have been received for xyz ingresscontroller May 12, 2021, 6:45 PM The openshift-ingress-operator/xyz ingresscontroller is degraded: . View details May 12, 2021, 6:45 PM Pod openshift-ingress/router-xyz-6bb7549fc9-gwfc7 has been in a non-ready state for longer than 15 minutes. View details <--snip--> May 12, 2021, 6:45 PM The openshift-ingress-operator/xyz ingresscontroller is unavailable: .
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438