Description of problem: install logs shows: level=info msg="Cluster operator console Available is False with DeploymentAvailableFailedUpdate: DeploymentAvailable: 1 replicas ready at version 4.3.0-0.ci-2019-11-01-122324" level=info msg="Cluster operator insights Disabled is False with : " level=fatal msg="failed to initialize the cluster: Working towards 4.3.0-0.ci-2019-11-01-122324: 100% complete" https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.3/127 Would expect this to show less than 100% complete (I am opening a separate bug against console for the fact that it did not become available) Similar behavior seen in: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.3/127
Since we didn't incorrectly mark the cluster ready to use, and this is a cosmetic fix.
cosmetic fix will happen in the CVO, whose message the installer is just passing along.
I would really like to say: Working towards 4.3.0-0.ci-2019-11-01-122324: $n of $m objects applied (100%) in the current sync round with maybe batched updates so we didn't push too often (e.g. push when there has been a change but the current ClusterVersion status is >30s old). For the specific job that lead to this bug [1]: 2019-11-01T15:44:07.358978842Z E1101 15:44:07.358590 1 task.go:77] error running apply for clusteroperator "console" (332 of 486): Cluster operator console has not yet reported success ... 2019-11-01T15:44:07.35927455Z I1101 15:44:07.359244 1 task_graph.go:611] Result of work: [Cluster operator console has not yet reported success] Presumably the ClusterOperator was at the back of its manifest block, and we successfully pushed all 485 other manifests in that sync round. Including "in the current sync round" would also mitigate bug 1690816. I'd also like that 'Result of work' to go into .extensions or some such on the cluster object, for folks who want a dive into the details of the sticking manifests. [1]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.3/127/artifacts/e2e-gcp-upgrade/must-gather/registry-svc-ci-openshift-org-ocp-4-3-2019-11-01-122324-sha256-dae1257b516a5c177237cfef5a6a3e241962b0d20cf54bcb2b66dc1671c5035e/namespaces/openshift-cluster-version/pods/cluster-version-operator-6c89697849-h9p7t/cluster-version-operator/cluster-version-operator/logs/current.log
*** Bug 1829118 has been marked as a duplicate of this bug. ***
[1] would make collecting information like "which manifests have we failed to push?" easier, but it's unlikely to land during freeze. Punting to UpcomingSprint; hopefully we'll make some progress here once master and 4.6 split off from 4.5. [1]: https://github.com/openshift/cluster-version-operator/pull/264
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features. Hence we are adding UpcomingSprint now, and we'll revisit the next sprint.
We do not have time to fix the bug in this sprint as we are working on higher priority bugs and features. Hence we are adding UpcomingSprint now, and we'll revisit this in the next sprint.
This is more feature-y, and we're past feature freeze for 4.6. Moving this to 4.7.
I still want comment 3, but this is a cosmetic issue, so it's taken a backseat to more impactful bugs. Maybe next sprint...
Comment 12 is still current.
*** Bug 1897612 has been marked as a duplicate of this bug. ***
Replacing Fraction [1] with done/total [2] and rendering done and total (and a locally-computed percent) with comment 3's: Working towards 4.3.0-0.ci-2019-11-01-122324: $n of $m objects applied (100%) in the current sync round seems like it wouldn't be that bad. I don't see a way to get to [3]'s Complete without syncing all the resources. If there is a way, it's probably its own bug, because we don't want to transition to reconciling mode before we have reconciled all the manifests. [1]: https://github.com/openshift/cluster-version-operator/blob/1e51a0e4750ca110d4659f33bce210a3de6844b9/pkg/cvo/sync_worker.go#L92 [2]: https://github.com/openshift/cluster-version-operator/blob/1e51a0e4750ca110d4659f33bce210a3de6844b9/pkg/cvo/sync_worker.go#L783-L784 [3]: https://github.com/openshift/cluster-version-operator/blob/1e51a0e4750ca110d4659f33bce210a3de6844b9/pkg/cvo/sync_worker.go#L753
Reducing the severity of the bug as this a cosmetic issue and not causing the CI jobs to fail.
Installation monitor against 4.7.0-0.nightly-2021-01-21-172657 ... level=debug msg=Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2021-01-21-172657: 640 of 664 done (96% complete) level=debug msg=Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2021-01-21-172657: 642 of 664 done (96% complete) level=debug msg=Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2021-01-21-172657: 644 of 664 done (96% complete) level=debug msg=Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2021-01-21-172657: 649 of 664 done (97% complete) level=debug msg=Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2021-01-21-172657: 658 of 664 done (99% complete) level=debug msg=Still waiting for the cluster to initialize: Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 2 level=debug msg=Cluster is initialized Upgrade monitor against v4.6 to 4.7.0-0.nightly-2021-01-21-172657 version 4.6.13 True True 3s Working towards 4.7.0-0.nightly-2021-01-21-172657: downloading update version 4.6.13 True True 63s Working towards 4.7.0-0.nightly-2021-01-21-172657: 70 of 664 done (10% complete) .. version 4.6.13 True True 4m6s Working towards 4.7.0-0.nightly-2021-01-21-172657: 96 of 664 done (14% complete) version 4.6.13 True True 18m Working towards 4.7.0-0.nightly-2021-01-21-172657: 116 of 664 done (17% complete) version 4.6.13 True True 19m Working towards 4.7.0-0.nightly-2021-01-21-172657: 174 of 664 done (26% complete), waiting on machine-api, openshift-apiserver version 4.6.13 True True 21m Working towards 4.7.0-0.nightly-2021-01-21-172657: 175 of 664 done (26% complete) version 4.6.13 True True 23m Working towards 4.7.0-0.nightly-2021-01-21-172657: 358 of 664 done (53% complete) version 4.6.13 True True 24m Working towards 4.7.0-0.nightly-2021-01-21-172657: 497 of 664 done (74% complete) version 4.6.13 True True 25m Working towards 4.7.0-0.nightly-2021-01-21-172657: 516 of 664 done (77% complete) version 4.6.13 True True 26m Working towards 4.7.0-0.nightly-2021-01-21-172657: 518 of 664 done (78% complete), waiting on cluster-autoscaler version 4.6.13 True True 34m Working towards 4.7.0-0.nightly-2021-01-21-172657: 527 of 664 done (79% complete) version 4.6.13 True True 35m Working towards 4.7.0-0.nightly-2021-01-21-172657: 527 of 664 done (79% complete), waiting on network version 4.6.13 True True 44m Working towards 4.7.0-0.nightly-2021-01-21-172657: 556 of 664 done (83% complete) version 4.6.13 True True 45m Working towards 4.7.0-0.nightly-2021-01-21-172657: 191 of 664 done (28% complete) version 4.6.13 True True 50m Working towards 4.7.0-0.nightly-2021-01-21-172657: 556 of 664 done (83% complete) version 4.6.13 True True 51m Working towards 4.7.0-0.nightly-2021-01-21-172657: 556 of 664 done (83% complete), waiting on machine-config version 4.6.13 True True 55m Working towards 4.7.0-0.nightly-2021-01-21-172657: 3 of 664 done (0% complete) version 4.6.13 True True 56m Working towards 4.7.0-0.nightly-2021-01-21-172657: 175 of 664 done (26% complete) version 4.7.0-0.nightly-2021-01-21-172657 True False 0s Cluster version is 4.7.0-0.nightly-2021-01-21-172657 New reporting progress looks good.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633