Bug 1952405
| Summary: | console-operator is not reporting correct Available status | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yadan Pei <yapei> |
| Component: | Management Console | Assignee: | Jakub Hadvig <jhadvig> |
| Status: | CLOSED ERRATA | QA Contact: | Yadan Pei <yapei> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.8 | CC: | aos-bugs, jokerman, spadgett, wking, yapei |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
[bz-Management Console] clusteroperator/console should not change condition/Available
|
|
| Last Closed: | 2021-07-27 23:02:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
it would be good if we can improve Deployment_FailedUpdate message as well, currently it shows as following in installation log:
15:47:12 level=debug msg=Still waiting for the cluster to initialize: Cluster operator console is not available
15:52:01 level=debug msg=Still waiting for the cluster to initialize: Cluster operator console is not available
15:55:27 level=info msg=Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
15:55:27 level=info msg=Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.8.0-0.nightly-2021-04-24-234710
15:55:27 level=info msg=Cluster operator console Available is False with Deployment_FailedUpdate: DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710
and in console-operator log:
I0425 07:31:42.706485 1 status_controller.go:172] clusteroperator/console diff {"status":{"conditions":[{"lastTransitionTime":"2021-04-25T07:17:22Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-04-25T07:31:40Z","message":"SyncLoopRefreshProgressing: Working toward version 4.8.0-0.nightly-2021-04-24-234710","reason":"SyncLoopRefresh_InProgress","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-04-25T07:31:42Z","message":"DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710","reason":"Deployment_FailedUpdate","status":"False","type":"Available"},{"lastTransitionTime":"2021-04-25T07:17:22Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
From the message: Deployment_FailedUpdate: DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710 it's hard to tell what's going wrongly *** Bug 1948081 has been marked as a duplicate of this bug. *** Mentioning the failing test-case so Sippy can find this bug, now that bug 1948081 has been closed as a dup. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: when cluster operator console is Degraded, we still report Available: True even console is not accessible Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-04-21-172405 How reproducible: Always Steps to Reproduce: 1. Set openshift-ingress-operator and router-default unmanaged, then scale down deployment/router-default to simulate a network issue $ cat > version-patch-first-override.yaml << EOF - op: add path: /spec/overrides value: - kind: Deployment group: apps/v1 name: router-default namespace: openshift-ingress unmanaged: true EOF $ cat > version-patch-add-override.yaml << EOF - op: add path: /spec/overrides/- value: kind: Deployment group: apps/v1 name: ingress-operator namespace: openshift-ingress-operator unmanaged: true EOF $ oc patch clusterversion version --type json -p "$(cat version-patch-first-override.yaml)" clusterversion.config.openshift.io/version patched $ oc patch clusterversion version --type json -p "$(cat version-patch-add-override.yaml)" clusterversion.config.openshift.io/version patched $ oc get clusterversion version -o json | jq .spec.overrides [ { "group": "apps/v1", "kind": "Deployment", "name": "router-default", "namespace": "openshift-ingress", "unmanaged": true }, { "group": "apps/v1", "kind": "Deployment", "name": "ingress-operator", "namespace": "openshift-ingress-operator", "unmanaged": true } ] $ # oc get co | grep -e authentication -e console // check console, authentication Available status authentication 4.8.0-0.nightly-2021-04-21-172405 True False False 5h45m console 4.8.0-0.nightly-2021-04-21-172405 True False False 5h40m $ oc scale deployment ingress-operator --replicas=0 -n openshift-ingress-operator $ oc scale deployment router-default --replicas=0 -n openshift-ingress $ oc get pods -n openshift-ingress; oc get pods -n openshift-ingress-operator No resources found in openshift-ingress namespace. No resources found in openshift-ingress-operator namespace. //wait until no pods in openshift-ingress and openshift-ingress-operator namespace 2. after no router-default pod is available, check clusteroperator console, authentication status again # oc get co | grep -e authentication -e console authentication 4.8.0-0.nightly-2021-04-21-172405 False False False 112s console 4.8.0-0.nightly-2021-04-21-172405 True False False 5h43m # oc get co authentication -o json | jq .status.conditions [ { "lastTransitionTime": "2021-04-22T05:53:12Z", "message": "OAuthServerRouteEndpointAccessibleControllerDegraded: Get \"https://oauth-openshift.apps.qe-ui48-0422.qe.devcluster.openshift.com/healthz\": EOF", "reason": "OAuthServerRouteEndpointAccessibleController_SyncError", "status": "True", "type": "Degraded" }, { "lastTransitionTime": "2021-04-22T00:07:36Z", "message": "All is well", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2021-04-22T05:51:12Z", "message": "OAuthServerRouteEndpointAccessibleControllerAvailable: Get \"https://oauth-openshift.apps.qe-ui48-0422.qe.devcluster.openshift.com/healthz\": EOF", "reason": "OAuthServerRouteEndpointAccessibleController_EndpointUnavailable", "status": "False", "type": "Available" }, { "lastTransitionTime": "2021-04-21T23:43:45Z", "message": "All is well", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] # oc get co console -o json | jq .status.conditions [ { "lastTransitionTime": "2021-04-22T05:53:25Z", "message": "RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.qe-ui48-0422.qe.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.qe-ui48-0422.qe.devcluster.openshift.com/health\": EOF", "reason": "RouteHealth_FailedGet", "status": "True", "type": "Degraded" }, { "lastTransitionTime": "2021-04-21T23:55:12Z", "message": "All is well", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2021-04-22T00:09:30Z", "message": "All is well", "reason": "AsExpected", "status": "True", "type": "Available" }, { "lastTransitionTime": "2021-04-21T23:48:24Z", "message": "All is well", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] 3. Actual results: 2. co/console is reporting Degraded: True but Available: True, actually console is not available now Expected results: 2. when console is not available, we should report Available: False Additional info: