1952405 – console-operator is not reporting correct Available status

Bug 1952405 - console-operator is not reporting correct Available status

Summary: console-operator is not reporting correct Available status

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Jakub Hadvig
QA Contact:	Yadan Pei
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1948081 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-22 08:42 UTC by Yadan Pei
Modified:	2021-07-27 23:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:	[bz-Management Console] clusteroperator/console should not change condition/Available
Last Closed:	2021-07-27 23:02:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console-operator pull 552	0	None	open	Bug 1952405: Console operator should report Available:False when it's route is not accessible	2021-06-07 20:50:37 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:03:13 UTC

Description Yadan Pei 2021-04-22 08:42:37 UTC

Description of problem:
when cluster operator console is Degraded, we still report Available: True even console is not accessible

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-21-172405

How reproducible:
Always

Steps to Reproduce:
1. Set openshift-ingress-operator and router-default unmanaged, then scale down deployment/router-default to simulate a network issue
$ cat > version-patch-first-override.yaml << EOF
- op: add
  path: /spec/overrides
  value:
  - kind: Deployment
    group: apps/v1
    name: router-default
    namespace: openshift-ingress
    unmanaged: true
EOF
$ cat > version-patch-add-override.yaml << EOF
- op: add
  path: /spec/overrides/-
  value:
    kind: Deployment
    group: apps/v1
    name: ingress-operator
    namespace: openshift-ingress-operator
    unmanaged: true
EOF
$ oc patch clusterversion version --type json -p "$(cat version-patch-first-override.yaml)"
clusterversion.config.openshift.io/version patched
$ oc patch clusterversion version --type json -p "$(cat version-patch-add-override.yaml)"
clusterversion.config.openshift.io/version patched
$ oc get clusterversion version -o json | jq .spec.overrides
[
  {
    "group": "apps/v1",
    "kind": "Deployment",
    "name": "router-default",
    "namespace": "openshift-ingress",
    "unmanaged": true
  },
  {
    "group": "apps/v1",
    "kind": "Deployment",
    "name": "ingress-operator",
    "namespace": "openshift-ingress-operator",
    "unmanaged": true
  }
]
$ # oc get co | grep -e authentication -e console    // check console, authentication Available status
authentication                             4.8.0-0.nightly-2021-04-21-172405   True        False         False      5h45m
console                                    4.8.0-0.nightly-2021-04-21-172405   True        False         False      5h40m 
$ oc scale deployment ingress-operator --replicas=0 -n openshift-ingress-operator
$ oc scale deployment router-default --replicas=0 -n openshift-ingress
$ oc get pods -n openshift-ingress; oc get pods -n openshift-ingress-operator
No resources found in openshift-ingress namespace.
No resources found in openshift-ingress-operator namespace.
//wait until no pods in openshift-ingress  and openshift-ingress-operator namespace
2. after no router-default pod is available, check clusteroperator console, authentication status again
# oc get co | grep -e authentication -e console
authentication                             4.8.0-0.nightly-2021-04-21-172405   False       False         False      112s
console                                    4.8.0-0.nightly-2021-04-21-172405   True        False         False      5h43m
# oc get co authentication -o json | jq .status.conditions
[
  {
    "lastTransitionTime": "2021-04-22T05:53:12Z",
    "message": "OAuthServerRouteEndpointAccessibleControllerDegraded: Get \"https://oauth-openshift.apps.qe-ui48-0422.qe.devcluster.openshift.com/healthz\": EOF",
    "reason": "OAuthServerRouteEndpointAccessibleController_SyncError",
    "status": "True",
    "type": "Degraded"
  },
  {
    "lastTransitionTime": "2021-04-22T00:07:36Z",
    "message": "All is well",
    "reason": "AsExpected",
    "status": "False",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2021-04-22T05:51:12Z",
    "message": "OAuthServerRouteEndpointAccessibleControllerAvailable: Get \"https://oauth-openshift.apps.qe-ui48-0422.qe.devcluster.openshift.com/healthz\": EOF",
    "reason": "OAuthServerRouteEndpointAccessibleController_EndpointUnavailable",
    "status": "False",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2021-04-21T23:43:45Z",
    "message": "All is well",
    "reason": "AsExpected",
    "status": "True",
    "type": "Upgradeable"
  }
]
# oc get co console -o json | jq .status.conditions
[
  {
    "lastTransitionTime": "2021-04-22T05:53:25Z",
    "message": "RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.qe-ui48-0422.qe.devcluster.openshift.com/health): Get \"https://console-openshift-console.apps.qe-ui48-0422.qe.devcluster.openshift.com/health\": EOF",
    "reason": "RouteHealth_FailedGet",
    "status": "True",
    "type": "Degraded"
  },
  {
    "lastTransitionTime": "2021-04-21T23:55:12Z",
    "message": "All is well",
    "reason": "AsExpected",
    "status": "False",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2021-04-22T00:09:30Z",
    "message": "All is well",
    "reason": "AsExpected",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2021-04-21T23:48:24Z",
    "message": "All is well",
    "reason": "AsExpected",
    "status": "True",
    "type": "Upgradeable"
  }
]


3.

Actual results:
2. co/console is reporting Degraded: True but Available: True, actually console is not available now

Expected results:
2. when console is not available, we should report Available: False

Additional info:

Comment 1 Yadan Pei 2021-04-25 11:23:49 UTC

it would be good if we can improve Deployment_FailedUpdate message as well, currently it shows as following in installation log:
15:47:12  level=debug msg=Still waiting for the cluster to initialize: Cluster operator console is not available
15:52:01  level=debug msg=Still waiting for the cluster to initialize: Cluster operator console is not available
15:55:27  level=info msg=Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
15:55:27  level=info msg=Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.8.0-0.nightly-2021-04-24-234710
15:55:27  level=info msg=Cluster operator console Available is False with Deployment_FailedUpdate: DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710



and in console-operator log:

I0425 07:31:42.706485       1 status_controller.go:172] clusteroperator/console diff {"status":{"conditions":[{"lastTransitionTime":"2021-04-25T07:17:22Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-04-25T07:31:40Z","message":"SyncLoopRefreshProgressing: Working toward version 4.8.0-0.nightly-2021-04-24-234710","reason":"SyncLoopRefresh_InProgress","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-04-25T07:31:42Z","message":"DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710","reason":"Deployment_FailedUpdate","status":"False","type":"Available"},{"lastTransitionTime":"2021-04-25T07:17:22Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}

Comment 2 Yadan Pei 2021-04-25 11:24:18 UTC

From the message: Deployment_FailedUpdate: DeploymentAvailable: 1 replicas ready at version 4.8.0-0.nightly-2021-04-24-234710 it's hard to tell what's going wrongly

Comment 3 Jakub Hadvig 2021-04-26 13:18:22 UTC

*** Bug 1948081 has been marked as a duplicate of this bug. ***

Comment 4 W. Trevor King 2021-04-26 22:05:01 UTC

Mentioning the failing test-case so Sippy can find this bug, now that bug 1948081 has been closed as a dup.

Comment 10 errata-xmlrpc 2021-07-27 23:02:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.