The fix for bug 2079803 included a ClusterVersion fetch and history inspection. But until install-completion, there will be no completed history entries. And installation failures which include entries like [1]: level=error msg=Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-06-08 19:26:01 +0000 UTC <nil> 4.11.0-0.ci.test-2022-06-08-190030-ci-op-zq5cm5gx-initial registry.build02.ci.openshift.org/ci-op-zq5cm5gx/release@sha256:e08abf8ba61271954f9b785a4cbdf6571723b925872b05fd9f4d3ecc1dc6e135 false }] may be distracting to users trying to understand a failed install. We can fix by getting the current version from the etcd operator's OPERATOR_IMAGE_VERSION environment variable [2]. I'm filing a new bug for this, because bug 2079803 has already been backported to 4.10.z as bug 2091604, and that bug is likely to ship in this week's 4.10.z (likely to be named 4.10.19, but not actually built yet). [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3167/pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade/1534611208269729792#1:build-log.txt%3A84 [2]: https://github.com/openshift/cluster-etcd-operator/blob/28a4ae406ff736b00af68c4f4d249319d62e48dd/manifests/0000_20_etcd-operator_06_deployment.yaml#L71-L72
Today I launched a cluster with latest payload 4.11.0-0.nightly-2022-06-16-221335, installation failed with similar errors: 06-17 03:04:45.610 level=error msg=Cluster operator etcd Degraded is True with UpgradeBackupController_Error: UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-06-17 02:13:35 +0000 UTC <nil> 4.11.0-0.nightly-2022-06-16-221335 registry.ci.openshift.org/ocp/release@sha256:7d6c5e2594bd9d89592712c60f0af8f1ec750951c3ded3a16326551f431c8719 false }] 06-17 03:04:45.610 level=info msg=Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required 06-17 03:04:45.610 level=info msg=Cluster operator monitoring Available is False with MultipleTasksFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. 06-17 03:04:45.610 level=info msg=Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 06-17 03:04:45.610 level=error msg=Cluster operator monitoring Degraded is True with MultipleTasksFailed: Failed to rollout the stack. Error: updating alertmanager: waiting for Alertmanager object changes failed: waiting for Alertmanager openshift-monitoring/main: expected 2 replicas, got 0 updated replicas 06-17 03:04:45.611 level=error msg=updating prometheus-k8s: waiting for Prometheus object changes failed: waiting for Prometheus openshift-monitoring/k8s: expected 2 replicas, got 1 updated replicas 06-17 03:04:45.611 level=error msg=Cluster operator network Degraded is True with RolloutHung: DaemonSet "/openshift-sdn/sdn" rollout is not making progress - last change 2022-06-17T02:25:11Z 06-17 03:04:45.611 level=info msg=Cluster operator network ManagementStateDegraded is False with : 06-17 03:04:45.611 level=info msg=Cluster operator network Progressing is True with Deploying: DaemonSet "/openshift-sdn/sdn" is not available (awaiting 6 nodes) oc get co | grep -v "True .*False .*False" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE etcd 4.11.0-0.nightly-2022-06-16-221335 True False True 59m UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-06-17 02:13:35 +0000 UTC <nil> 4.11.0-0.nightly-2022-06-16-221335 registry.ci.openshift.org/ocp/release@sha256:7d6c5e2594bd9d89592712c60f0af8f1ec750951c3ded3a16326551f431c8719 false }] monitoring False True True 40m Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. network 4.11.0-0.nightly-2022-06-16-221335 True True True 62m DaemonSet "/openshift-sdn/sdn" rollout is not making progress - last change 2022-06-17T02:25:11Z Checked the other 2 clusteroperators, seems they show separate issue, I'd like to file separate bug 2097954 for the other 2 clusteroperators.
Tried some installation covered different platform, have not hit this issue, close it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399