Description of problem: when cluster operator console is Degraded, we still report Available: True even console is not accessible Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-04-21-172405 How reproducible: Always Steps to Reproduce: 1. Set openshift-ingress-operator and router-default unmanaged, then scale down deployment/router-default to simulate a network issue $ cat > version-patch-first-override.yaml << EOF - op: add path: /spec/overrides value: - kind: Deployment group: apps/v1 name: router-default namespace: openshift-ingress unmanaged: true EOF $ cat > version-patch-add-override.yaml << EOF - op: add path: /spec/overrides/- value: kind: Deployment group: apps/v1 name: ingress-operator namespace: openshift-ingress-operator unmanaged: true EOF $ oc patch clusterversion version --type json -p "$(cat version-patch-first-override.yaml)" clusterversion.config.openshift.io/version patched $ oc patch clusterversion version --type json -p "$(cat version-patch-add-override.yaml)" clusterversion.config.openshift.io/version patched $ oc get clusterversion version -o json | jq .spec.overrides [ { "group": "apps/v1", "kind": "Deployment", "name": "router-default", "namespace": "openshift-ingress", "unmanaged": true }, { "group": "apps/v1", "kind": "Deployment", "name": "ingress-operator", "namespace": "openshift-ingress-operator", "unmanaged": true } ] $ # oc get co | grep -e authentication -e console // check console, authentication Available status authentication 4.8.0-0.nightly-2021-04-21-172405 True False False 5h45m console 4.8.0-0.nightly-2021-04-21-172405 True False False 5h40m $ oc scale deployment ingress-operator --replicas=0 -n openshift-ingress-operator $ oc scale deployment router-default --replicas=0 -n openshift-ingress $ oc get pods -n openshift-ingress; oc get pods -n openshift-ingress-operator No resources found in openshift-ingress namespace. No resources found in openshift-ingress-operator namespace. //wait until no pods in openshift-ingress and openshift-ingress-operator namespace 2. after no router-default pod is available, check clusteroperator console, authentication status again # oc get co | grep -e authentication -e console authentication 4.8.0-0.nightly-2021-04-21-172405 False False False 112s console 4.8.0-0.nightly-2021-04-21-172405 True False False 5h43m # oc get co authentication -o json | jq .status.conditions [ { "lastTransitionTime": "2021-04-22T05:53:12Z", "message": "OAuthServerRouteEndpointAccessibleControllerDegraded: Get \"https://oauth-openshift.apps.qe-ui48-0422.qe.devcluster.openshift.com/healthz\": EOF", "reason": "OAuthServerRouteEndpointAccessibleController_SyncError", "status": "True", "type": "Degraded" }, { "lastTransitionTime": "2021-04-22T00:07:36Z", "message": "All is well", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2021-04-22T05:51:12Z", "message": "OAuthServerRouteEndpointAccessibleControllerAvailable: Get \"https://oauth-openshift.apps.qe-ui48-0422.qe.devcluster.openshift.com/healthz\": EOF", "reason": "OAuthServerRouteEndpointAccessibleController_EndpointUnavailable", "status": "False", "type": "Available" }, { "lastTransitionTime": "2021-04-21T23:43:45Z", "message": "All is well", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] # oc get co console -o json | jq .status.conditions [ { "lastTransitionTime": "2021-04-22T05:53:25Z", "message": "RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.qe-ui48-0422.qe.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.qe-ui48-0422.qe.devcluster.openshift.com/health\": EOF", "reason": "RouteHealth_FailedGet", "status": "True", "type": "Degraded" }, { "lastTransitionTime": "2021-04-21T23:55:12Z", "message": "All is well", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2021-04-22T00:09:30Z", "message": "All is well", "reason": "AsExpected", "status": "True", "type": "Available" }, { "lastTransitionTime": "2021-04-21T23:48:24Z", "message": "All is well", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] 3. Actual results: 2. co/console is reporting Degraded: True but Available: True, actually console is not available now Expected results: 2. when console is not available, we should report Available: False Additional info:
it would be good if we can improve Deployment_FailedUpdate message as well, currently it shows as following in installation log: 15:47:12 level=debug msg=Still waiting for the cluster to initialize: Cluster operator console is not available 15:52:01 level=debug msg=Still waiting for the cluster to initialize: Cluster operator console is not available 15:55:27 level=info msg=Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform 15:55:27 level=info msg=Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.8.0-0.nightly-2021-04-24-234710 15:55:27 level=info msg=Cluster operator console Available is False with Deployment_FailedUpdate: DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710 and in console-operator log: I0425 07:31:42.706485 1 status_controller.go:172] clusteroperator/console diff {"status":{"conditions":[{"lastTransitionTime":"2021-04-25T07:17:22Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-04-25T07:31:40Z","message":"SyncLoopRefreshProgressing: Working toward version 4.8.0-0.nightly-2021-04-24-234710","reason":"SyncLoopRefresh_InProgress","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-04-25T07:31:42Z","message":"DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710","reason":"Deployment_FailedUpdate","status":"False","type":"Available"},{"lastTransitionTime":"2021-04-25T07:17:22Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
From the message: Deployment_FailedUpdate: DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710 it's hard to tell what's going wrongly
*** Bug 1948081 has been marked as a duplicate of this bug. ***
Mentioning the failing test-case so Sippy can find this bug, now that bug 1948081 has been closed as a dup.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438