1994443 – openshift-console operator incorrectly reports Available=false

Bug 1994443 - openshift-console operator incorrectly reports Available=false

Summary: openshift-console operator incorrectly reports Available=false

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Jakub Hadvig
QA Contact:	Yadan Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2001268
TreeView+	depends on / blocked

Reported:	2021-08-17 09:41 UTC by Jan Chaloupka
Modified:	2023-09-15 01:13 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:	job=periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade=all job=periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade=all
Last Closed:	2021-10-18 17:46:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console-operator pull 580	0	None	None	None	2021-09-02 11:32:19 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:47:06 UTC

Description Jan Chaloupka 2021-08-17 09:41:30 UTC

From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade/1427185491152736256:

```
2 unexpected clusteroperator state transitions during e2e test run

Aug 16 09:39:13.626 - 14s   E clusteroperator/console condition/Available status/False reason/DeploymentAvailable: 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143
1 tests failed during this blip (2021-08-16 09:39:13.626741252 +0000 UTC to 2021-08-16 09:39:13.626741252 +0000 UTC): [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
```

Checking build.log:
```
Aug 16 09:23:54.000 I ns/openshift-console-operator deployment/console-operator reason/OperatorStatusChanged Status for clusteroperator/console changed: status.versions changed from [{"operator" "4.9.0-0.nightly-2021-08-14-065522"}] to [{"operator" "4.9.0-0.nightly-2021-08-16-082143"}]
Aug 16 09:23:54.000 I ns/openshift-console-operator deployment/console-operator reason/OperatorVersionChanged clusteroperator/console version "operator" changed from "4.9.0-0.nightly-2021-08-14-065522" to "4.9.0-0.nightly-2021-08-16-082143"
Aug 16 09:23:55.009 I clusteroperator/console versions: operator 4.9.0-0.nightly-2021-08-14-065522 -> 4.9.0-0.nightly-2021-08-16-082143
Aug 16 09:32:30.581 E ns/openshift-console-operator pod/console-operator-7779b75d9c-xshrk node/ip-10-0-130-236.us-west-1.compute.internal container/console-operator reason/ContainerExit code/1 cause/Error 4.980443       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"b2c8ac7a-74ec-4844-8a1a-5d72605535b6", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: status.versions changed from [{"operator" "4.9.0-0.nightly-2021-08-14-065522"}] to [{"operator" "4.9.0-0.nightly-2021-08-16-082143"}]\nI0816 09:32:26.311529       1 cmd.go:88] Received SIGTERM or SIGINT signal, shutting down controller.\nI0816 09:32:26.315043       1 reflector.go:225] Stopping reflector *v1.ClusterOperator (10m0s) from github.com/openshift/client-go/config/informers/externalversions/factory.go:101\nI0816 09:32:26.315149       1 reflector.go:225] Stopping reflector *v1.Infrastructure (10m0s) from github.com/openshift/client-go/config/informers/externalversions/factory.go:101\nI0816 09:32:26.315214       1 reflector.go:225] Stopping reflector *v1.Proxy (10m0s) from github.com/openshift/client-go/config/informers/externalversions/factory.go:101\nI0816 09:32:26.315279       1 reflector.go:225] Stopping reflector *v1.Secret (10m0s) from k8s.io/client-go/informers/factory.go:134\nI0816 09:32:26.315355       1 reflector.go:225] Stopping reflector *v1.ConfigMap (10m0s) from k8s.io/client-go/informers/factory.go:134\nI0816 09:32:26.315352       1 reflector.go:225] Stopping reflector *v1.Secret (10m0s) from k8s.io/client-go/informers/factory.go:134\nI0816 09:32:26.315479       1 reflector.go:225] Stopping reflector *v1.Route (10m0s) from github.com/openshift/client-go/route/informers/externalversions/factory.go:101\nI0816 09:32:26.315559       1 reflector.go:225] Stopping reflector *v1.Secret (10m0s) from k8s.io/client-go/informers/factory.go:134\nI0816 09:32:26.315723       1 reflector.go:225] Stopping reflector *v1.Secret (10m0s) from k8s.io/client-go/informers/factory.go:134\nW0816 09:32:26.315743       1 builder.go:99] graceful termination failed, controllers failed with error: stopped\n
Aug 16 09:39:13.000 I ns/openshift-console-operator deployment/console-operator reason/OperatorStatusChanged Status for clusteroperator/console changed: Available changed from True to False ("DeploymentAvailable: 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143")
Aug 16 09:39:13.626 E clusteroperator/console condition/Available status/False reason/Deployment_FailedUpdate changed: DeploymentAvailable: 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143
Aug 16 09:39:13.626 - 14s   E clusteroperator/console condition/Available status/False reason/DeploymentAvailable: 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143
Aug 16 09:39:28.000 I ns/openshift-console-operator deployment/console-operator reason/OperatorStatusChanged Status for clusteroperator/console changed: Available changed from False to True ("All is well")
Aug 16 09:39:28.166 W clusteroperator/console condition/Available status/True reason/AsExpected changed: All is well
```

Important log line:
```
Aug 16 09:39:13.626 E clusteroperator/console condition/Available status/False reason/Deployment_FailedUpdate changed: DeploymentAvailable: 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143
```

From loki logs (https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%5B%221629064800000%22,%221629151199000%22,%22Grafana%20Cloud%22,%7B%22expr%22:%22%7Binvoker%3D%5C%22openshift-internal-ci%2Fperiodic-ci-openshift-release-master-nightly-4.9-e2e-aws-upgrade%2F1427185491152736256%5C%22%7D%20%7C%20unpack%20%7C%20namespace%3D%5C%22openshift-console-operator%5C%22%22%7D%5D):

```
I0816 09:39:28.135226 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"b2c8ac7a-74ec-4844-8a1a-5d72605535b6", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Available changed from False to True ("All is well")
I0816 09:39:28.119547 1 status_controller.go:211] clusteroperator/console diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-16T09:03:38Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-08-16T09:04:17Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-16T09:39:28Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-08-16T09:03:47Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}

E0816 09:39:15.008547 1 status.go:78] DeploymentAvailable FailedUpdate 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143
I0816 09:39:13.600090 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"b2c8ac7a-74ec-4844-8a1a-5d72605535b6", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Available changed from True to False ("DeploymentAvailable: 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143")

I0816 09:39:13.561850 1 status_controller.go:211] clusteroperator/console diff {"status":{"conditions":[{"lastTransitionTime":"2021-08-16T09:03:38Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-08-16T09:04:17Z","message":"All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-08-16T09:39:13Z","message":"DeploymentAvailable: 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143","reason":"Deployment_FailedUpdate","status":"False","type":"Available"},{"lastTransitionTime":"2021-08-16T09:03:47Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}

E0816 09:39:13.461120 1 status.go:78] DeploymentAvailable FailedUpdate 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143

I0816 09:23:54.980443 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"b2c8ac7a-74ec-4844-8a1a-5d72605535b6", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: status.versions changed from [{"operator" "4.9.0-0.nightly-2021-08-14-065522"}] to [{"operator" "4.9.0-0.nightly-2021-08-16-082143"}]
```

Important log line (more precise time of the FailedUpdate log line):
```
E0816 09:39:13.461120 1 status.go:78] DeploymentAvailable FailedUpdate 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143
```

Meaning there was only a single replica set pod running and ready. Yet, not with the expected version of 4.9.0-0.nightly-2021-08-16-082143. Check https://github.com/openshift/console-operator/blob/293fc1630197ff8afa54e86a7c5feb53fa62a3a2/pkg/console/operator/sync_v400.go#L158-L160 for more details. In short, IsReadyAndUpdated (https://github.com/openshift/console-operator/blob/293fc1630197ff8afa54e86a7c5feb53fa62a3a2/pkg/console/subresource/deployment/deployment.go#L537) checks if the console pod instance is updated and ready. It requires the new replica set to be fully running (i.e. exactly one instance of pod of the new RS). However, that is not the case yet.

Checking kubelet logs for running console instances of any console RS:
```
ip-10-0-160-242.us-west-1.compute.internal:12077:Aug 16 09:03:45.192057 ip-10-0-160-242 hyperkube[1368]: I0816 09:03:45.192019    1368 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-x2qvc]
ip-10-0-160-242.us-west-1.compute.internal:25665:Aug 16 09:45:54.011355 ip-10-0-160-242 hyperkube[1458]: I0816 09:45:54.011327    1458 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-57czc]
ip-10-0-252-206.us-west-1.compute.internal:9394:Aug 16 09:03:41.393224 ip-10-0-252-206 hyperkube[1368]: I0816 09:03:41.393154    1368 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-6499869746-6qrmm]
ip-10-0-252-206.us-west-1.compute.internal:12240:Aug 16 09:32:26.441870 ip-10-0-252-206 hyperkube[1368]: I0816 09:32:26.441844    1368 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-wg49v]
ip-10-0-130-236.us-west-1.compute.internal:8160:Aug 16 09:03:41.365724 ip-10-0-130-236 hyperkube[1366]: I0816 09:03:41.365692    1366 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-6499869746-bmrp8]
ip-10-0-130-236.us-west-1.compute.internal:8275:Aug 16 09:04:15.570701 ip-10-0-130-236 hyperkube[1366]: I0816 09:04:15.570675    1366 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-nsw68]
ip-10-0-130-236.us-west-1.compute.internal:17895:Aug 16 09:39:13.123526 ip-10-0-130-236 hyperkube[1454]: I0816 09:39:13.123510    1454 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-rvql7]
```

Sorted be ADD timestamp:
```
Aug 16 09:03:41.365724 ip-10-0-130-236 hyperkube[1366]: I0816 09:03:41.365692    1366 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-6499869746-bmrp8]
Aug 16 09:03:41.393224 ip-10-0-252-206 hyperkube[1368]: I0816 09:03:41.393154    1368 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-6499869746-6qrmm]
Aug 16 09:03:45.192057 ip-10-0-160-242 hyperkube[1368]: I0816 09:03:45.192019    1368 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-x2qvc]
Aug 16 09:04:15.570701 ip-10-0-130-236 hyperkube[1366]: I0816 09:04:15.570675    1366 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-nsw68]
Aug 16 09:32:26.441870 ip-10-0-252-206 hyperkube[1368]: I0816 09:32:26.441844    1368 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-wg49v]
Aug 16 09:39:13.123526 ip-10-0-130-236 hyperkube[1454]: I0816 09:39:13.123510    1454 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-rvql7]
Aug 16 09:45:54.011355 ip-10-0-160-242 hyperkube[1458]: I0816 09:45:54.011327    1458 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-57czc]
```

Checking which pods are running at which intervals:
```
Aug 16 09:32:26.000 I ns/openshift-console replicaset/console-7ccf696fc8 reason/SuccessfulCreate Created pod: console-7ccf696fc8-wg49v
Aug 16 09:32:26.410 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ reason/Created
Aug 16 09:32:26.420 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal reason/Scheduled
Aug 16 09:32:28.000 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal container/console reason/Created
Aug 16 09:32:28.000 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal container/console reason/Pulled image/quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c565063ed198885b27d0a226c67cdbfbd14981172f7f9e06632b186dfe54189a
Aug 16 09:32:28.000 I ns/openshift-console pod/console-7ccf696fc8-wg49v reason/AddedInterface Add eth0 [10.130.0.56/23] from openshift-sdn
Aug 16 09:32:29.000 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal container/console reason/Started
Aug 16 09:32:29.406 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal container/console reason/ContainerStart duration/3.00s
Aug 16 09:32:29.555 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal container/console reason/Ready
Aug 16 09:45:53.000 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal container/console reason/Killing
Aug 16 09:45:53.959 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal reason/GracefulDelete duration/40s
Aug 16 09:46:20.106 E ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal container/console reason/ContainerExit code/2 cause/Error W0816 09:32:29.078663       1 main.go:206] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!\nI0816 09:32:29.078769       1 main.go:278] cookies are secure!\nI0816 09:32:29.226020       1 main.go:660] Binding to [::]:8443...\nI0816 09:32:29.226114       1 main.go:662] using TLS\n
Aug 16 09:46:20.148 I ns/openshift-console pod/console-7ccf696fc8-wg49v node/ip-10-0-252-206.us-west-1.compute.internal reason/Deleted
```

```
Aug 16 09:39:13.000 I ns/openshift-console replicaset/console-7ccf696fc8 reason/SuccessfulCreate Created pod: console-7ccf696fc8-rvql7
Aug 16 09:39:13.083 I ns/openshift-console pod/console-7ccf696fc8-rvql7 node/ip-10-0-130-236.us-west-1.compute.internal reason/Scheduled
Aug 16 09:39:13.099 I ns/openshift-console pod/console-7ccf696fc8-rvql7 node/ reason/Created
Aug 16 09:39:16.000 I ns/openshift-console pod/console-7ccf696fc8-rvql7 reason/AddedInterface Add eth0 [10.129.0.21/23] from openshift-sdn
Aug 16 09:39:18.000 I ns/openshift-console pod/console-7ccf696fc8-rvql7 node/ip-10-0-130-236.us-west-1.compute.internal container/console reason/Pulled image/quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c565063ed198885b27d0a226c67cdbfbd14981172f7f9e06632b186dfe54189a
Aug 16 09:39:24.000 I ns/openshift-console pod/console-7ccf696fc8-rvql7 node/ip-10-0-130-236.us-west-1.compute.internal container/console reason/Created
Aug 16 09:39:24.000 I ns/openshift-console pod/console-7ccf696fc8-rvql7 node/ip-10-0-130-236.us-west-1.compute.internal container/console reason/Started
Aug 16 09:39:27.513 I ns/openshift-console pod/console-7ccf696fc8-rvql7 node/ip-10-0-130-236.us-west-1.compute.internal container/console reason/ContainerStart duration/11.00s
Aug 16 09:39:27.902 I ns/openshift-console pod/console-7ccf696fc8-rvql7 node/ip-10-0-130-236.us-west-1.compute.internal container/console reason/Ready
```

```
Aug 16 09:03:52.573 W ns/openshift-console pod/console-7ccf696fc8-nsw68 reason/FailedScheduling 0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) didn't match pod anti-affinity rules.
Aug 16 09:32:25.000 I ns/openshift-console pod/console-7ccf696fc8-nsw68 node/ip-10-0-130-236.us-west-1.compute.internal container/console reason/Killing
Aug 16 09:32:25.892 I ns/openshift-console pod/console-7ccf696fc8-nsw68 node/ip-10-0-130-236.us-west-1.compute.internal reason/GracefulDelete duration/40s
Aug 16 09:32:52.588 E ns/openshift-console pod/console-7ccf696fc8-nsw68 node/ip-10-0-130-236.us-west-1.compute.internal container/console reason/ContainerExit code/2 cause/Error W0816 09:04:17.732932       1 main.go:206] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!\nI0816 09:04:17.733007       1 main.go:278] cookies are secure!\nI0816 09:04:17.783515       1 main.go:660] Binding to [::]:8443...\nI0816 09:04:17.783535       1 main.go:662] using TLS\n
Aug 16 09:32:52.652 I ns/openshift-console pod/console-7ccf696fc8-nsw68 node/ip-10-0-130-236.us-west-1.compute.internal reason/Deleted
```
+ kubelet logs:
```
ip-10-0-130-236.us-west-1.compute.internal:8275:Aug 16 09:04:15.570701 ip-10-0-130-236 hyperkube[1366]: I0816 09:04:15.570675    1366 kubelet.go:2065] "SyncLoop ADD" source="api" pods=[openshift-console/console-7ccf696fc8-nsw68]
ip-10-0-130-236.us-west-1.compute.internal:8335:Aug 16 09:04:17.692046 ip-10-0-130-236 crio[1326]: time="2021-08-16 09:04:17.692004836Z" level=info msg="Started container d07e7f817d221af35df7c029c7bec6ccda30834f8e7a2580b4aebf79819de047: openshift-console/console-7ccf696fc8-nsw68/console" id=7c939d30-36fa-4814-8f85-019faf12df59 name=/runtime.v1alpha2.RuntimeService/StartContainer
ip-10-0-130-236.us-west-1.compute.internal:8336:Aug 16 09:04:18.500966 ip-10-0-130-236 hyperkube[1366]: I0816 09:04:18.500935    1366 kubelet.go:2103] "SyncLoop (PLEG): event for pod" pod="openshift-console/console-7ccf696fc8-nsw68" event=&{ID:f16c3f18-f6be-42a2-baba-4fb470d8df93 Type:ContainerStarted Data:d07e7f817d221af35df7c029c7bec6ccda30834f8e7a2580b4aebf79819de047}
ip-10-0-130-236.us-west-1.compute.internal:8337:Aug 16 09:04:18.500966 ip-10-0-130-236 hyperkube[1366]: I0816 09:04:18.500981    1366 kubelet.go:2103] "SyncLoop (PLEG): event for pod" pod="openshift-console/console-7ccf696fc8-nsw68" event=&{ID:f16c3f18-f6be-42a2-baba-4fb470d8df93 Type:ContainerStarted Data:e41790cf8a7a353c58db642ce948912a3bfbec5019cdc07d86b39e364c3f0301}
ip-10-0-130-236.us-west-1.compute.internal:8338:Aug 16 09:04:18.501514 ip-10-0-130-236 hyperkube[1366]: I0816 09:04:18.501290    1366 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="" pod="openshift-console/console-7ccf696fc8-nsw68"
ip-10-0-130-236.us-west-1.compute.internal:8339:Aug 16 09:04:18.504954 ip-10-0-130-236 hyperkube[1366]: I0816 09:04:18.504932    1366 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-console/console-7ccf696fc8-nsw68"
```

Based on the logs one can deduce:
- console-7ccf696fc8-nsw68 was running&ready in <09:04:18.504954; 09:32:52.588>
- console-7ccf696fc8-wg49v was running&ready in <09:32:29.555; 09:46:20.106>
- console-7ccf696fc8-rvql7 was running&ready in <09:39:27.902; ???>

Summarizing the important timestamps:
- 09:23:54.980443: operator version changed from 4.9.0-0.nightly-2021-08-14-065522 to 4.9.0-0.nightly-2021-08-16-082143
- <09:32:29.555; 09:46:20.106>: console-7ccf696fc8-wg49v was running&ready
- 09:39:13.000: created console-7ccf696fc8-rvql7
- <09:39:27.902; ???>: console-7ccf696fc8-rvql7 was running&ready
- 09:39:13.461120: reporting "DeploymentAvailable FailedUpdate 1 replicas ready at version 4.9.0-0.nightly-2021-08-16-082143"

So between 09:04:18.504954 and 09:39:27.902 there's always at least one console pod running and ready.

From https://github.com/openshift/api/blob/a6156965faae5ce117e3cd3735981a3fc0e27e27/config/v1/types_cluster_operator.go#L146:
```
// Available indicates that the operand (eg: openshift-apiserver for the
// openshift-apiserver-operator), is functional and available in the cluster.
// Available=False means at least part of the component is non-functional,
// and that the condition requires immediate administrator intervention.
```

So changing condition/Available to false is incorrect.

Comment 1 Jan Chaloupka 2021-08-17 09:42:44 UTC

Other instances of the flake: https://search.ci.openshift.org/?search=%5C%5Bbz-Management+Console%5C%5D+clusteroperator%2Fconsole+should+not+change+condition%2FAvailable&maxAge=24h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 2 Jan Chaloupka 2021-08-17 09:51:20 UTC

Checking if the operand is upgraded or in the process of upgrading belongs under condition/Progressing. From https://github.com/openshift/api/blob/a6156965faae5ce117e3cd3735981a3fc0e27e27/config/v1/types_cluster_operator.go#L152-L159:
```
// Progressing indicates that the operator is actively rolling out new code,
// propagating config changes, or otherwise moving from one steady state to
// another.
```

Comment 5 Yadan Pei 2021-09-06 06:27:09 UTC

did some regression testing about console-operator such as the steps in 
https://bugzilla.redhat.com/show_bug.cgi?id=1989055#c7
https://bugzilla.redhat.com/show_bug.cgi?id=1952405#c0

in all conditions console-operator is reporting correct Available status, also check console-operator logs:


$ oc logs -f console-operator-56ccbc8575-2tfs5  -n openshift-console-operator | grep '1 replica'
E0906 01:46:48.231736       1 status.go:78] SyncLoopRefreshProgressing InProgress Working toward version 4.9.0-0.nightly-2021-09-05-192114, 1 replicas available
I0906 01:46:48.419484       1 status_controller.go:211] clusteroperator/console diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-05T23:58:11Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-06T01:46:46Z","message":"SyncLoopRefreshProgressing: Working toward version 4.9.0-0.nightly-2021-09-05-192114, 1 replicas available","reason":"SyncLoopRefresh_InProgress","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-06T00:00:38Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-05T23:52:44Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0906 01:46:48.430693       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"3bd0ea6a-0039-43dd-a1d8-98b98a99e36d", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Progressing message changed from "SyncLoopRefreshProgressing: Changes made during sync updates, additional sync expected." to "SyncLoopRefreshProgressing: Working toward version 4.9.0-0.nightly-2021-09-05-192114, 1 replicas available"
E0906 03:02:48.305656       1 status.go:78] SyncLoopRefreshProgressing InProgress Working toward version 4.9.0-0.nightly-2021-09-05-192114, 1 replicas available
I0906 03:02:48.493694       1 status_controller.go:211] clusteroperator/console diff {"status":{"conditions":[{"lastTransitionTime":"2021-09-05T23:58:11Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-09-06T03:02:46Z","message":"SyncLoopRefreshProgressing: Working toward version 4.9.0-0.nightly-2021-09-05-192114, 1 replicas available","reason":"SyncLoopRefresh_InProgress","status":"True","type":"Progressing"},{"lastTransitionTime":"2021-09-06T00:00:38Z","message":"All is well","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2021-09-05T23:52:44Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0906 03:02:48.507008       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"3bd0ea6a-0039-43dd-a1d8-98b98a99e36d", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Progressing message changed from "SyncLoopRefreshProgressing: Changes made during sync updates, additional sync expected." to "SyncLoopRefreshProgressing: Working toward version 4.9.0-0.nightly-2021-09-05-192114, 1 replicas available"


We can see from above logs that when only 1 replicas is available console is reporting Available: True, Progress status is reporting True as expected as well

However when I set console to be Removed, console deployment, service and route will be entirely removed, co/console is still reporting Available: True while user can not visit console at all

$ oc get console.operator cluster -o json | jq .spec
{
  "logLevel": "Normal",
  "managementState": "Removed",
  "operatorLogLevel": "Debug"
}

$ while true;do oc get all -n openshift-console | grep console; oc get co | grep console; sleep 5; done | tee -a watch-consoleoperator.log
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      27m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      27m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      27m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      27m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      28m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      29m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      29m     
No resources found in openshift-console namespace.
console                                    4.9.0-0.nightly-2021-09-05-192114   True        False         False      29m     

Check console-operator logs:
I0906 06:17:38.219270       1 controller.go:99] console is in a removed state: deleting ConsoleCliDownloads custom resources
I0906 06:17:38.219732       1 controller.go:97] console-operator is in a removed state: deleting "downloads" service
I0906 06:17:38.220376       1 controller.go:95] console is in an removed state: removing synced downloads deployment
I0906 06:17:38.221685       1 operator.go:243] console has been removed.
I0906 06:17:38.221701       1 operator.go:254] deleting console resources
I0906 06:17:38.224760       1 controller.go:97] console-operator is in a removed state: deleting "console" service
I0906 06:17:38.226293       1 controller.go:137] finished deleting ConsoleCliDownloads custom resources
E0906 06:17:38.226724       1 base_controller.go:251] "ConsoleDownloadsDeploymentSyncController" controller failed to sync "key", err: deployments.apps "downloads" not found
I0906 06:17:38.226884       1 reflector.go:381] github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101: forcing resync
I0906 06:17:38.226906       1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
I0906 06:17:38.226928       1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
I0906 06:17:38.226956       1 reflector.go:381] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: forcing resync
I0906 06:17:38.230116       1 controller.go:99] console is in a removed state: deleting ConsoleCliDownloads custom resources
I0906 06:17:38.230311       1 reflector.go:381] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: forcing resync
I0906 06:17:38.237610       1 controller.go:137] finished deleting ConsoleCliDownloads custom resources
I0906 06:17:38.243785       1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
I0906 06:17:38.254859       1 operator.go:276] finished deleting console resources
I0906 06:17:38.254875       1 operator.go:229] finished syncing operator "cluster" (22.519µs)
I0906 06:17:38.395944       1 request.go:600] Waited for 169.098818ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:38.399730       1 controller.go:95] console is in an removed state: removing synced downloads deployment
E0906 06:17:38.402286       1 base_controller.go:251] "ConsoleDownloadsDeploymentSyncController" controller failed to sync "key", err: deployments.apps "downloads" not found
I0906 06:17:38.596052       1 request.go:600] Waited for 366.520012ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:38.602627       1 controller.go:111] console-operator is in a removed state: deleting "console" route
I0906 06:17:38.795633       1 request.go:600] Waited for 565.236754ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:38.800517       1 controller.go:97] console-operator is in a removed state: deleting "downloads" service
I0906 06:17:38.995906       1 request.go:600] Waited for 763.809168ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:38.999645       1 controller.go:111] console-operator is in a removed state: deleting "downloads" route
I0906 06:17:39.195848       1 request.go:600] Waited for 958.181465ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:39.199715       1 controller.go:97] console-operator is in a removed state: deleting "console" service
I0906 06:17:39.395922       1 request.go:600] Waited for 1.140995178s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:39.395940       1 request.go:668] Waited for 1.140995178s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:39.407964       1 operator.go:181] started syncing operator "cluster" (2021-09-06 06:17:39.407955857 +0000 UTC m=+22870.770623255)
I0906 06:17:39.421543       1 operator.go:243] console has been removed.
I0906 06:17:39.421563       1 operator.go:254] deleting console resources
I0906 06:17:39.445637       1 operator.go:276] finished deleting console resources
I0906 06:17:39.445653       1 operator.go:229] finished syncing operator "cluster" (28.604µs)
I0906 06:17:39.595886       1 request.go:600] Waited for 1.010927579s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:39.600289       1 controller.go:97] console-operator is in a removed state: skipping health checks
I0906 06:17:39.795975       1 request.go:600] Waited for 1.176406105s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
I0906 06:17:39.799882       1 controller.go:111] console-operator is in a removed state: deleting "console" route
I0906 06:17:44.097611       1 controller.go:95] console is in an removed state: removing synced downloads deployment
E0906 06:17:44.101325       1 base_controller.go:251] "ConsoleDownloadsDeploymentSyncController" controller failed to sync "key", err: deployments.apps "downloads" not found
I0906 06:17:48.107616       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="136.121µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:43888" resp=200
I0906 06:17:48.111398       1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="4.380768ms" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:43890" resp=200
I0906 06:17:55.227930       1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="5.034854ms" userAgent="Prometheus/2.29.2" srcIP="10.128.2.18:33640" resp=200
I0906 06:17:58.110608       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="145.334µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44070" resp=200
I0906 06:17:58.110868       1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="2.160809ms" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44068" resp=200
I0906 06:18:07.860377       1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="7.130971ms" userAgent="Prometheus/2.29.2" srcIP="10.129.2.11:50168" resp=200
I0906 06:18:08.106778       1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="430.825µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44270" resp=200
I0906 06:18:08.106966       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="111.263µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44268" resp=200
I0906 06:18:18.110641       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="129.717µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44462" resp=200
I0906 06:18:18.112941       1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="4.288098ms" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44464" resp=200
I0906 06:18:25.232542       1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="9.188576ms" userAgent="Prometheus/2.29.2" srcIP="10.128.2.18:33640" resp=200
I0906 06:18:28.107569       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="157.81µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44642" resp=200
I0906 06:18:28.108827       1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="2.33758ms" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44644" resp=200
I0906 06:18:37.864605       1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="11.181203ms" userAgent="Prometheus/2.29.2" srcIP="10.129.2.11:50168" resp=200
I0906 06:18:38.107179       1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="325.339µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44830" resp=200
I0906 06:18:38.107448       1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="97.67µs" userAgent="kube-probe/1.22+" srcIP="10.130.0.1:44832" resp=200
I0906 06:18:38.589726       1 controller.go:97] console-operator is in a removed state: skipping health checks


Shall we update Available from True to False for this case?

Comment 6 Yadan Pei 2021-09-06 09:46:26 UTC

Moving this bug to VERIFIED since the original issue reported is fixed and will track the Available status issue when console is removed in a separate bug

Comment 7 Yadan Pei 2021-09-06 09:54:06 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=2001523 the bug tracking wrong Available status when console is removed

Comment 10 errata-xmlrpc 2021-10-18 17:46:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 11 Red Hat Bugzilla 2023-09-15 01:13:44 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.