Bug 2089950

Summary: Upgrade fails with message Cluster operator console is not available
Product: OpenShift Container Platform Reporter: Ian Miller <imiller>
Component: Management ConsoleAssignee: Jakub Hadvig <jhadvig>
Status: CLOSED ERRATA QA Contact: Yanping Zhang <yanpzhan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.9CC: yapei
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:49:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ian Miller 2022-05-24 19:07:21 UTC
Description of problem: Some upgrades failed during scale testing with messages indicating the console operator is not available. In total 5 out of 2200 clusters failed with this pattern.

These clusters are all configured with the Console operator disabled in order to reduce overall OCP cpu use in the Telecom environment. The following CR is applied:
apiVersion: operator.openshift.io/v1
kind: Console
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "false"
    include.release.openshift.io/self-managed-high-availability: "false"
    include.release.openshift.io/single-node-developer: "false"
    release.openshift.io/create-only: "true"
    ran.openshift.io/ztp-deploy-wave: "10"
  name: cluster
spec:
  logLevel: Normal
  managementState: Removed
  operatorLogLevel: Normal


From one cluster (sno01175) the ClusterVersion conditions show:

# oc get clusterversion version -o jsonpath='{.status.conditions}' | jq
[
  {
    "lastTransitionTime": "2022-05-19T01:44:13Z",
    "message": "Done applying 4.9.26",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-24T14:57:50Z",
    "message": "Cluster operator console is degraded",
    "reason": "ClusterOperatorDegraded",
    "status": "True",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-24T13:49:43Z",
    "message": "Unable to apply 4.10.13: wait has exceeded 40 minutes for these operators: console",
    "reason": "ClusterOperatorDegraded",
    "status": "True",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2022-05-21T02:07:06Z",
    "status": "True",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-24T13:53:05Z",
    "message": "Payload loaded version=\"4.10.13\" image=\"quay.io/openshift-release-dev/ocp-release@sha256:4f516616baed3cf84585e753359f7ef2153ae139c2e80e0191902fbd073c4143\"",
    "reason": "PayloadLoaded",
    "status": "True",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-24T13:57:05Z",
    "message": "Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor version (1.22.5+5c84e52) on node sno01175 will not be supported in the next OpenShift minor version upgrade.",
    "reason": "KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade",
    "status": "False",
    "type": "Upgradeable"
  }
]

Another cluster (sno01959) has very similar conditions with slight variation in the Failing and Progressing messages:
  {
    "lastTransitionTime": "2022-05-24T14:32:42Z",
    "message": "Cluster operator console is not available",
    "reason": "ClusterOperatorNotAvailable",
    "status": "True",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-24T13:52:04Z",
    "message": "Unable to apply 4.10.13: the cluster operator console has not yet successfully rolled out",
    "reason": "ClusterOperatorNotAvailable",
    "status": "True",
    "type": "Progressing"
  },


Version-Release number of selected component (if applicable): 4.9.26 upgrade to 4.10.13


How reproducible: 5 out of 2200


Steps to Reproduce:
1. Disable console with managementState: Removed
2. Starting OCP version 4.9.26
3. Initiate upgrade to 4.10.13 via ClusterVersion CR

Actual results: Cluster upgrade is stuck (no longer progressing) for 5+ hours


Expected results: Cluster upgrade completes


Additional info:

Comment 3 Yanping Zhang 2022-08-16 09:06:55 UTC
Steps to verify:
1.Create a cluster with payload 4.12.0-0.nightly-2022-08-12-053438 
# oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-08-12-053438   True        False         89m     Cluster version is 4.12.0-0.nightly-2022-08-12-053438

2.Set console as Removed in console operator.
spec:
  logLevel: Normal
  managementState: Removed

3. Update the cluster to new build:
# oc adm upgrade 
info: An upgrade is in progress. Working towards 4.12.0-0.nightly-2022-08-15-092951

Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph
Channel: stable-4.12

Recommended updates:

  VERSION                            IMAGE
  4.12.0-0.nightly-2022-08-15-150248 registry.ci.openshift.org/ocp/release@sha256:acbff11e154fef25f7244d20b7cda9c3b30c7ef062a23ccccb1c164a45a7f32b

4.Wait for upgrade to finish successfully.
# oc adm upgrade 
Cluster version is 4.12.0-0.nightly-2022-08-15-092951

Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph
Channel: stable-4.12

Recommended updates:

  VERSION                            IMAGE
  4.12.0-0.nightly-2022-08-15-150248 registry.ci.openshift.org/ocp/release@sha256:acbff11e154fef25f7244d20b7cda9c3b30c7ef062a23ccccb1c164a45a7f32b
# oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-08-15-092951   True        False         29m     Cluster version is 4.12.0-0.nightly-2022-08-15-092951
 oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2022-08-15-092951   True        False         False      179m    
baremetal                                  4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h13m   
cloud-controller-manager                   4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   
cloud-credential                           4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h16m   
cluster-autoscaler                         4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h13m   
config-operator                            4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   
console                                    4.12.0-0.nightly-2022-08-15-092951   True        False         False      99s     
csi-snapshot-controller                    4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   
dns                                        4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h13m   
etcd                                       4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h13m   
image-registry                             4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h7m    
ingress                                    4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h7m    
insights                                   4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h8m    
kube-apiserver                             4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h11m   
kube-controller-manager                    4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h12m   
kube-scheduler                             4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h11m   
kube-storage-version-migrator              4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   
machine-api                                4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h8m    
machine-approver                           4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h13m   
machine-config                             4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h13m   
marketplace                                4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h13m   
monitoring                                 4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h5m    
network                                    4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h15m   
node-tuning                                4.12.0-0.nightly-2022-08-15-092951   True        False         False      52m     
openshift-apiserver                        4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h8m    
openshift-controller-manager               4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h8m    
openshift-samples                          4.12.0-0.nightly-2022-08-15-092951   True        False         False      54m     
operator-lifecycle-manager                 4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   
operator-lifecycle-manager-catalog         4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   
operator-lifecycle-manager-packageserver   4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h8m    
service-ca                                 4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   
storage                                    4.12.0-0.nightly-2022-08-15-092951   True        False         False      3h14m   

5.Set console to Managed in console operator. Console could be accessed normally.

The bug is fixed.

Comment 7 errata-xmlrpc 2023-01-17 19:49:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399