+++ This bug was initially created as a clone of Bug #1779640 +++ Description of problem: 4.3 nightly -> 4.3 nightly update failed: `failed to initialize the cluster: Cluster operator cluster-autoscaler is still updating` https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11999 Clusteroperators list (https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11999/artifacts/e2e-aws-upgrade/clusteroperators.json) shows its empty (?): ... { "apiVersion": "config.openshift.io/v1", "kind": "ClusterOperator", "metadata": { "creationTimestamp": "2019-12-04T00:20:33Z", "generation": 1, "name": "cluster-autoscaler", "resourceVersion": "11333", "selfLink": "/apis/config.openshift.io/v1/clusteroperators/cluster-autoscaler", "uid": "ed891617-6cf2-4c78-9c0e-54d2e86af724" }, "spec": {} }, ... Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-12-03-211441 -> 4.3.0-0.nightly-2019-12-03-234445 How reproducible: Additional info: --- Additional comment from Brad Ison on 2019-12-04 15:49:10 UTC --- The underlying issue here is that etcd was under load and taking multiple seconds to sync its log, which was causing leader elections, and I think some API writes to fail. In addition, the cluster-autoscaler-operator was not reporting failures to apply updates to its ClusterOperator resource, and worse, was not retrying when it failed to apply an "Available" status. So the CVO was unaware of its success. The linked PR fixes that, and I'll make sure it's back ported to previous releases.
Verified in 4.3.0-0.nightly-2019-12-10-014919 Cluster-autoscaler operator now reports status oc get co cluster-autoscaler -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: creationTimestamp: "2019-12-10T03:39:00Z" generation: 1 name: cluster-autoscaler resourceVersion: "96746" selfLink: /apis/config.openshift.io/v1/clusteroperators/cluster-autoscaler uid: 03097927-57d8-489e-b3db-21c3b2b138b1 spec: {} status: conditions: - lastTransitionTime: "2019-12-10T03:39:00Z" message: at version 4.3.0-0.nightly-2019-12-10-014919 status: "True" type: Available - lastTransitionTime: "2019-12-10T08:14:10Z" status: "False" type: Progressing - lastTransitionTime: "2019-12-10T03:39:00Z" status: "False" type: Degraded - lastTransitionTime: "2019-12-10T03:39:00Z" status: "True" type: Upgradeable extension: null relatedObjects: - group: machine.openshift.io name: "" namespace: openshift-machine-api resource: machineautoscalers - group: machine.openshift.io name: "" namespace: openshift-machine-api resource: clusterautoscalers - group: "" name: openshift-machine-api resource: namespaces versions: - name: operator version: 4.3.0-0.nightly-2019-12-10-014919
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062