Hide Forgot
Description of problem: Some upgrades failed during scale testing with messages indicating the console operator is not available. In total 5 out of 2200 clusters failed with this pattern. These clusters are all configured with the Console operator disabled in order to reduce overall OCP cpu use in the Telecom environment. The following CR is applied: apiVersion: operator.openshift.io/v1 kind: Console metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "false" include.release.openshift.io/self-managed-high-availability: "false" include.release.openshift.io/single-node-developer: "false" release.openshift.io/create-only: "true" ran.openshift.io/ztp-deploy-wave: "10" name: cluster spec: logLevel: Normal managementState: Removed operatorLogLevel: Normal From one cluster (sno01175) the ClusterVersion conditions show: # oc get clusterversion version -o jsonpath='{.status.conditions}' | jq [ { "lastTransitionTime": "2022-05-19T01:44:13Z", "message": "Done applying 4.9.26", "status": "True", "type": "Available" }, { "lastTransitionTime": "2022-05-24T14:57:50Z", "message": "Cluster operator console is degraded", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Failing" }, { "lastTransitionTime": "2022-05-24T13:49:43Z", "message": "Unable to apply 4.10.13: wait has exceeded 40 minutes for these operators: console", "reason": "ClusterOperatorDegraded", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2022-05-21T02:07:06Z", "status": "True", "type": "RetrievedUpdates" }, { "lastTransitionTime": "2022-05-24T13:53:05Z", "message": "Payload loaded version=\"4.10.13\" image=\"quay.io/openshift-release-dev/ocp-release@sha256:4f516616baed3cf84585e753359f7ef2153ae139c2e80e0191902fbd073c4143\"", "reason": "PayloadLoaded", "status": "True", "type": "ReleaseAccepted" }, { "lastTransitionTime": "2022-05-24T13:57:05Z", "message": "Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor version (1.22.5+5c84e52) on node sno01175 will not be supported in the next OpenShift minor version upgrade.", "reason": "KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade", "status": "False", "type": "Upgradeable" } ] Another cluster (sno01959) has very similar conditions with slight variation in the Failing and Progressing messages: { "lastTransitionTime": "2022-05-24T14:32:42Z", "message": "Cluster operator console is not available", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Failing" }, { "lastTransitionTime": "2022-05-24T13:52:04Z", "message": "Unable to apply 4.10.13: the cluster operator console has not yet successfully rolled out", "reason": "ClusterOperatorNotAvailable", "status": "True", "type": "Progressing" }, Version-Release number of selected component (if applicable): 4.9.26 upgrade to 4.10.13 How reproducible: 5 out of 2200 Steps to Reproduce: 1. Disable console with managementState: Removed 2. Starting OCP version 4.9.26 3. Initiate upgrade to 4.10.13 via ClusterVersion CR Actual results: Cluster upgrade is stuck (no longer progressing) for 5+ hours Expected results: Cluster upgrade completes Additional info:
Steps to verify: 1.Create a cluster with payload 4.12.0-0.nightly-2022-08-12-053438 # oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-12-053438 True False 89m Cluster version is 4.12.0-0.nightly-2022-08-12-053438 2.Set console as Removed in console operator. spec: logLevel: Normal managementState: Removed 3. Update the cluster to new build: # oc adm upgrade info: An upgrade is in progress. Working towards 4.12.0-0.nightly-2022-08-15-092951 Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph Channel: stable-4.12 Recommended updates: VERSION IMAGE 4.12.0-0.nightly-2022-08-15-150248 registry.ci.openshift.org/ocp/release@sha256:acbff11e154fef25f7244d20b7cda9c3b30c7ef062a23ccccb1c164a45a7f32b 4.Wait for upgrade to finish successfully. # oc adm upgrade Cluster version is 4.12.0-0.nightly-2022-08-15-092951 Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph Channel: stable-4.12 Recommended updates: VERSION IMAGE 4.12.0-0.nightly-2022-08-15-150248 registry.ci.openshift.org/ocp/release@sha256:acbff11e154fef25f7244d20b7cda9c3b30c7ef062a23ccccb1c164a45a7f32b # oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-15-092951 True False 29m Cluster version is 4.12.0-0.nightly-2022-08-15-092951 oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-2022-08-15-092951 True False False 179m baremetal 4.12.0-0.nightly-2022-08-15-092951 True False False 3h13m cloud-controller-manager 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m cloud-credential 4.12.0-0.nightly-2022-08-15-092951 True False False 3h16m cluster-autoscaler 4.12.0-0.nightly-2022-08-15-092951 True False False 3h13m config-operator 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m console 4.12.0-0.nightly-2022-08-15-092951 True False False 99s csi-snapshot-controller 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m dns 4.12.0-0.nightly-2022-08-15-092951 True False False 3h13m etcd 4.12.0-0.nightly-2022-08-15-092951 True False False 3h13m image-registry 4.12.0-0.nightly-2022-08-15-092951 True False False 3h7m ingress 4.12.0-0.nightly-2022-08-15-092951 True False False 3h7m insights 4.12.0-0.nightly-2022-08-15-092951 True False False 3h8m kube-apiserver 4.12.0-0.nightly-2022-08-15-092951 True False False 3h11m kube-controller-manager 4.12.0-0.nightly-2022-08-15-092951 True False False 3h12m kube-scheduler 4.12.0-0.nightly-2022-08-15-092951 True False False 3h11m kube-storage-version-migrator 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m machine-api 4.12.0-0.nightly-2022-08-15-092951 True False False 3h8m machine-approver 4.12.0-0.nightly-2022-08-15-092951 True False False 3h13m machine-config 4.12.0-0.nightly-2022-08-15-092951 True False False 3h13m marketplace 4.12.0-0.nightly-2022-08-15-092951 True False False 3h13m monitoring 4.12.0-0.nightly-2022-08-15-092951 True False False 3h5m network 4.12.0-0.nightly-2022-08-15-092951 True False False 3h15m node-tuning 4.12.0-0.nightly-2022-08-15-092951 True False False 52m openshift-apiserver 4.12.0-0.nightly-2022-08-15-092951 True False False 3h8m openshift-controller-manager 4.12.0-0.nightly-2022-08-15-092951 True False False 3h8m openshift-samples 4.12.0-0.nightly-2022-08-15-092951 True False False 54m operator-lifecycle-manager 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m operator-lifecycle-manager-catalog 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m operator-lifecycle-manager-packageserver 4.12.0-0.nightly-2022-08-15-092951 True False False 3h8m service-ca 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m storage 4.12.0-0.nightly-2022-08-15-092951 True False False 3h14m 5.Set console to Managed in console operator. Console could be accessed normally. The bug is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399