Description of problem: 4.3 nightly -> 4.3 nightly update failed: `failed to initialize the cluster: Cluster operator cluster-autoscaler is still updating` https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11999 Clusteroperators list (https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/11999/artifacts/e2e-aws-upgrade/clusteroperators.json) shows its empty (?): ... { "apiVersion": "config.openshift.io/v1", "kind": "ClusterOperator", "metadata": { "creationTimestamp": "2019-12-04T00:20:33Z", "generation": 1, "name": "cluster-autoscaler", "resourceVersion": "11333", "selfLink": "/apis/config.openshift.io/v1/clusteroperators/cluster-autoscaler", "uid": "ed891617-6cf2-4c78-9c0e-54d2e86af724" }, "spec": {} }, ... Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-12-03-211441 -> 4.3.0-0.nightly-2019-12-03-234445 How reproducible: Additional info:
The underlying issue here is that etcd was under load and taking multiple seconds to sync its log, which was causing leader elections, and I think some API writes to fail. In addition, the cluster-autoscaler-operator was not reporting failures to apply updates to its ClusterOperator resource, and worse, was not retrying when it failed to apply an "Available" status. So the CVO was unaware of its success. The linked PR fixes that, and I'll make sure it's back ported to previous releases.
> The underlying issue here is that etcd was under load and taking multiple seconds to sync its log, which was causing leader elections, and I think some API writes to fail. General tracker for this portion is bug 1775878.
Verified in 4.4.0-0.nightly-2019-12-19-223334. oc get co cluster-autoscaler -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: creationTimestamp: "2019-12-20T03:11:49Z" generation: 1 name: cluster-autoscaler resourceVersion: "9771" selfLink: /apis/config.openshift.io/v1/clusteroperators/cluster-autoscaler uid: 99dba483-4ca7-4f50-af40-6ceeddfd0143 spec: {} status: conditions: - lastTransitionTime: "2019-12-20T03:11:49Z" message: at version 4.4.0-0.nightly-2019-12-19-223334 status: "True" type: Available - lastTransitionTime: "2019-12-20T03:11:49Z" status: "False" type: Progressing - lastTransitionTime: "2019-12-20T03:11:49Z" status: "False" type: Degraded - lastTransitionTime: "2019-12-20T03:11:49Z" status: "True" type: Upgradeable extension: null relatedObjects: - group: machine.openshift.io name: "" namespace: openshift-machine-api resource: machineautoscalers - group: machine.openshift.io name: "" namespace: openshift-machine-api resource: clusterautoscalers - group: "" name: openshift-machine-api resource: namespaces versions: - name: operator version: 4.4.0-0.nightly-2019-12-19-223334