2009845 – Respect overrides changes during installation

Bug 2009845 - Respect overrides changes during installation

Summary: Respect overrides changes during installation

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Jack Ottofaro
QA Contact:	Yang Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2052839
TreeView+	depends on / blocked

Reported:	2021-10-01 18:29 UTC by W. Trevor King
Modified:	2022-08-10 10:38 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 10:37:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-version-operator pull 713	None	Merged	Bug 2009845: Respect overrides changes during install	2022-01-26 06:23:14 UTC
Github	openshift cluster-version-operator pull 727	None	Merged	Revert "Bug 2009845: Respect overrides changes during install"	2022-01-26 06:23:13 UTC
Github	openshift cluster-version-operator pull 728	None	Merged	Bug 2009845: Respect overrides changes during install (second round)	2022-01-26 06:23:12 UTC
Github	openshift cluster-version-operator pull 729	None	Merged	Bug 2009845: pkg/cvo/sync_worker: Do not cancel sync workers on install-time target-version change	2022-01-26 06:23:12 UTC
Github	openshift cluster-version-operator pull 730	None	Merged	Bug 2009845: pkg/cvo/sync_worker: Log transition to updating	2022-02-01 17:21:32 UTC
Github	openshift cluster-version-operator pull 733	None	Merged	Bug 2009845: pkg/cvo/sync_worker: Use current state, not suggested state, for guarding Initializing->Updating	2022-02-01 17:21:31 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 10:38:04 UTC

Description W. Trevor King 2021-10-01 18:29:42 UTC

4.7's bug 1905221, backported to 4.6.19 [1] avoided a too-early transition from install-mode to update-mode by ignoring overrides changes while we were in install-mode.  Touching overrides during installation had been infrequent, but yesterday 4.10 (bug 2008119) and 4.9 (bug 2009342) had the installer updated to set overrides in the initial ClusterVersion resource, and then clear them as bootstrapping was wrapping up.  This is causing some pain in CI, like [2]:

  : [sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel]	1s
  fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Oct  1 09:56:05.536: ClusterVersion Progressing=True: : Working towards 4.10.0-0.ci-2021-10-01-091426: 381 of 744 done (51% complete)

where the cluster is taking an extra 30s after going Available=True (and unblocking the installer to declare install-complete) and going Progressing=False:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1443867156340019200/artifacts/e2e-aws-serial/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")' 
  2021-10-01T09:55:37Z Available=True -: Done applying 4.10.0-0.ci-2021-10-01-091426
  2021-10-01T09:55:37Z Failing=False -: -
  2021-10-01T09:56:07Z Progressing=False -: Cluster version is 4.10.0-0.ci-2021-10-01-091426
  2021-10-01T09:30:40Z RetrievedUpdates=False NoChannel: The update channel has not been configured.

because of the deferred overrides change:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1443867156340019200/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-5c577f8d67-4wzqm_cluster-version-operator.log | grep 'overrides\|Running sync.*in state'
  W1001 09:37:51.814042       1 sync_worker.go:490] Ignoring detected version and overrides changes during payload initialization
  I1001 09:37:51.814176       1 sync_worker.go:542] Running sync 4.10.0-0.ci-2021-10-01-091426 (force=false) on generation 1 in state Initializing at attempt 0
  W1001 09:37:58.407651       1 sync_worker.go:490] Ignoring detected overrides change during payload initialization
  ...
  W1001 09:55:07.524312       1 sync_worker.go:490] Ignoring detected overrides change during payload initialization
  I1001 09:55:19.050794       1 sync_worker.go:542] Running sync 4.10.0-0.ci-2021-10-01-091426 (force=false) on generation 2 in state Initializing at attempt 4
  W1001 09:55:22.520587       1 sync_worker.go:490] Ignoring detected overrides change during payload initialization
  W1001 09:55:37.520888       1 sync_worker.go:490] Ignoring detected overrides change during payload initialization
  I1001 09:55:37.528893       1 sync_worker.go:493] Detected overrides change
  I1001 09:55:37.529222       1 sync_worker.go:493] Detected overrides change
  I1001 09:55:37.529329       1 sync_worker.go:542] Running sync 4.10.0-0.ci-2021-10-01-091426 (force=false) on generation 2 in state Updating at attempt 0
  I1001 09:58:58.297770       1 sync_worker.go:542] Running sync 4.10.0-0.ci-2021-10-01-091426 (force=false) on generation 2 in state Reconciling at attempt 0

We can be more responsive to users, and avoid the CI issues, if we adjust the CVO to respect 'overrides' changes during install-time, but to stay in the installation mode.

We should also consider how we treat 'desiredUpdate'.  Things might break if we use install-mode's flattened manifest graph instead of update-mode's dependency-ordered graph.  So I think we should continue to defer accepting target-release changes until install completes, unless someone gives us a solid use-case for install-time target-release pivots.  But I'm open to having that discussion now, if folks don't agree with my proposal to defer acceptance.

We should also confirm that observedGeneration:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1443867156340019200/artifacts/e2e-aws-serial/gather-extra/artifacts/clusterversion.json | jq '.items[].status.observedGeneration'
  2

gets incremented to match metadata.generation when the CVO notices and acts on a spec change.  For example, when we see the overrides change, we should update observedGeneration when we start acting on the new spec.  When we see a desiredUpdate bump, we should increment observedGeneration, even if we are not yet accepting applying the new target.  Users can make the "CVO realizes I've asked for something different, but is still applying an earlier target" vs. "CVO is applying the new target" distinction by checking status.desired.  But there is no status.updates, so we need to have accepted that change by the time we bump observedGeneration.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1921180#c5
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial/1443867156340019200

Comment 1 W. Trevor King 2021-10-01 20:47:57 UTC

[1] is temporary mitigation to keep CI happy while we work on this.

[1]: https://github.com/openshift/release/pull/22434

Comment 3 liujia 2022-01-11 09:58:41 UTC

I see the error from a recent(<6h) ci job:

# [sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Jan 11 07:42:29.166: ClusterVersion Progressing=True: : Working towards 4.10.0-0.nightly-2022-01-10-144202: 369 of 765 done (48% complete)

# curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade/1480795473177481216/artifacts/e2e-azure-ovn-upgrade/gather-extra/artifacts/clusterversion.json| jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")' 
2022-01-11T07:42:09Z Available=True -: Done applying 4.10.0-0.nightly-2022-01-10-144202
2022-01-11T07:40:24Z Failing=False -: -
2022-01-11T07:42:54Z Progressing=False -: Cluster version is 4.10.0-0.nightly-2022-01-10-144202
2022-01-11T07:17:26Z RetrievedUpdates=False NoChannel: The update channel has not been configured.

Hi @Trevor, @Jack
Any suggestion for QE to reproduce the issue? Or is it enough to verify the bug from above ci job search?

Comment 4 Jack Ottofaro 2022-01-11 14:28:28 UTC

(In reply to liujia from comment #3)
> I see the error from a recent(<6h) ci job:
> 
...
> Hi @Trevor, @Jack
> Any suggestion for QE to reproduce the issue? Or is it enough to verify the
> bug from above ci job search?

I think that's fine. But you can also just capture a cluster's initialization and now you should see:

I1217 21:21:11.059604       1 sync_worker.go:496] Detected version change
W1217 21:21:11.060024       1 sync_worker.go:499] Ignoring detected version change during payload initialization

rather than:

W0111 07:32:39.299462       1 sync_worker.go:494] Ignoring detected overrides change during payload initialization

And the Progressing=False transition time should not differ by 30 seconds.

Comment 7 liujia 2022-01-14 09:55:06 UTC

Can not reproduce the deferred transition time on either a normal installation or hacked installation with cvo-overrides with 4.10.0-fc.0.

# ./oc get clusterversion -ojson | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")' 
2022-01-14T06:08:08Z Upgradeable=False ClusterVersionOverridesSet: Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.
2022-01-14T06:30:14Z Available=True -: Done applying 4.10.0-fc.0
2022-01-14T06:30:14Z Failing=False -: -
2022-01-14T06:30:14Z Progressing=False -: Cluster version is 4.10.0-fc.0

So refer to comment 3 and comment 4, verified the issue with specific ci job.

There are not nightly build(>=4.10.0-0.nightly-2022-01-13-061145) failed for the same error in past 3 days.
# w3m -r -dump -cols 200 'https://search.ci.openshift.org/?search=Managed+cluster+should+start+all+core+operators&maxAge=72h&context=2&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job'|grep -o 'ClusterVersion Progressing=True.*4.10.*nightly.*2022-01.*'
ClusterVersion Progressing=True: : Working towards 4.10.0-0.nightly-2022-01-11-065245: 752 of 765 done (98% complete)
ClusterVersion Progressing=True: : Working towards 4.10.0-0.nightly-2022-01-11-065245: 369 of 765 done (48% complete)
ClusterVersion Progressing=True: : Working towards 4.10.0-0.nightly-2022-01-11-065245: 375 of 765 done (49% complete)
ClusterVersion Progressing=True: : Working towards 4.10.0-0.nightly-2022-01-11-065245: 389 of 765 done (50% complete)
ClusterVersion Progressing=True: : Working towards 4.10.0-0.nightly-2022-01-11-233044: 2 of 765 done (0% complete)

No time deferred on 4.10.0-0.nightly-2022-01-13-061145
# curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade/1481715413581565952/artifacts/e2e-azure-ovn-upgrade/gather-extra/artifacts/clusterversion.json| jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + (.reason // "-") + ": " + (.message // "-")'
2022-01-13T20:12:19Z RetrievedUpdates=False NoChannel: The update channel has not been configured.
2022-01-13T20:38:33Z Available=True -: Done applying 4.10.0-0.nightly-2022-01-13-061145
2022-01-13T20:38:33Z Failing=False -: -
2022-01-13T20:38:33Z Progressing=False -: Cluster version is 4.10.0-0.nightly-2022-01-13-061145

Double check from log to drop `overrides`
# curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade/1481715413581565952/artifacts/e2e-azure-ovn-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-6895f7f7c4-2gjhd_cluster-version-operator.log|grep -E "Detected|detected|overrides"
I0113 20:25:47.739431       1 sync_worker.go:496] Detected version change
W0113 20:25:47.739436       1 sync_worker.go:499] Ignoring detected version change during payload initialization

Comment 8 W. Trevor King 2022-01-17 21:11:38 UTC

Pulling this back to Assigned, because we're seeing failures like [1]:

level=info msg=Waiting up to 40m0s (until 4:06PM) for the cluster at https://api.ci-op-7bzs3rvq-a9175.**********************:6443 to initialize...
...
level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
...
level=fatal msg=failed to initialize the cluster: Multiple errors are preventing progress:
level=fatal msg=* Could not update clusterrole "cloud-controller-manager" (135 of 770)
level=fatal msg=* Could not update clusterrolebinding "cluster-storage-operator-role" (467 of 770)
level=fatal msg=* Could not update configmap "openshift-machine-config-operator/machine-config-osimageurl" (648 of 770)
...

Seems fairly common:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=build-log&search=failed+to+initialize+the+cluster:+Multiple+errors+are+preventing+progress' | grep '^periodic-.*failures match' | sort
periodic-ci-openshift-aws-efs-csi-driver-operator-release-4.10-nightly-operator-e2e (all) - 4 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-aws-efs-csi-driver-operator-release-4.11-nightly-operator-e2e (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
periodic-ci-openshift-driver-toolkit-release-4.11-e2e-aws-driver-toolkit (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-insights-operator-release-4.11-insights-operator-e2e-tests-periodic (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
periodic-ci-openshift-openshift-tests-private-release-4.11-extended-vsphere-ipi-parallel (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-ovn-kubernetes-release-4.11-e2e-ibmcloud-ipi-ovn-periodic (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-aws-crun (all) - 16 runs, 25% failed, 25% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-aws-ovn-upgrade (all) - 94 runs, 69% failed, 2% of failures match = 1% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-aws-serial (all) - 10 runs, 10% failed, 100% of failures match = 10% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade (all) - 51 runs, 82% failed, 7% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-cilium (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-techpreview-serial (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade (all) - 109 runs, 56% failed, 8% of failures match = 5% impact
periodic-ci-openshift-release-master-nightly-4.10-console-aws (all) - 5 runs, 80% failed, 25% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-cgroupsv2 (all) - 16 runs, 38% failed, 17% of failures match = 6% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-fips (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-fips-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-azure-csi (all) - 6 runs, 50% failed, 67% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-azure-deploy-cnv (all) - 7 runs, 57% failed, 25% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-gcp-csi-migration (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-techpreview-serial (all) - 10 runs, 40% failed, 25% of failures match = 10% impact
periodic-ci-openshift-release-master-nightly-4.10-e2e-vsphere-upi-serial (all) - 10 runs, 40% failed, 25% of failures match = 10% impact
periodic-ci-openshift-verification-tests-master-stable-4.10-e2e-vsphere-cucushift-upi (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-windows-machine-config-operator-release-4.10-vsphere-e2e-periodic (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-windows-machine-config-operator-release-4.11-vsphere-e2e-periodic (all) - 6 runs, 50% failed, 33% of failures match = 17% impact

although that query will also pick up jobs which fail around this point for other reasons.  Back to [1]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade/1483093759888461824/artifacts/e2e-gcp-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-675f8bcf96-q7mlx_cluster-version-operator.log | grep -B3 'Notify the sync worker that new work is available' | tail -n8
I0117 16:10:31.122027       1 sync_worker.go:496] Detected overrides change
I0117 16:10:31.122036       1 sync_worker.go:253] Cancel the sync worker's current loop
I0117 16:10:31.122097       1 sync_worker.go:259] Notify the sync worker that new work is available
--
I0117 16:10:46.121492       1 cvo.go:552] Desired version from operator is v1.Update{Version:"4.10.0-0.ci-2022-01-17-080901", Image:"registry.build03.ci.openshift.org/ci-op-7bzs3rvq/release@sha256:bed94c1e285b46597091bdd130a13716bf898fba036747d89fbbcf2103c5f2bc", Force:false}
I0117 16:10:46.121509       1 sync_worker.go:496] Detected overrides change
I0117 16:10:46.121515       1 sync_worker.go:253] Cancel the sync worker's current loop
I0117 16:10:46.121560       1 sync_worker.go:259] Notify the sync worker that new work is available

Strangely, there are no overrides in that run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade/1483093759888461824/artifacts/e2e-gcp-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].spec'
{
  "clusterID": "86304ee7-13ab-4229-87fd-d3493e4f83c5"
}

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade/1483093759888461824

Comment 10 W. Trevor King 2022-01-19 06:47:20 UTC

#729 (just attached) should get test coverage like:

1. Begin installing version A.
2. During installation, 'oc adm upgrade --to-image "${VERSION_B_PULLSPEC}"' or similar (might need some --allow-explicit-upgrade and similar to make oc happy enough) to retarget to version B.
3. Confirm:

     $ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 | grep 'Running sync.*in state'

   gives output like:

     ...Running sync ${VERSION_A} (force=false) on generation ... in state Initializing at attempt ...

   to confirm that the attempt to retarget is being ignored.

4. Insert some overrides [1] (still during install).  Look at the CVO logs to ensure that the overridden objects are no longer being reconciled:

     $ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 | grep "Running sync for ${KIND} "${NAMESPACE}/${NAME}"

   which should give output up until the new override was inserted, but stop giving output after.

5. Once install completes, confirm that the cluster immediately transitions to an update to version B, with 'oc adm upgrade' or similar.

[1]: https://github.com/openshift/enhancements/blob/12546519c04f3a25245352d722466c7786fb44b3/dev-guide/cluster-version-operator/dev/clusterversion.md#setting-objects-unmanaged

Comment 11 W. Trevor King 2022-01-19 07:40:26 UTC

For clusters that don't retarget the version during install, #729 is irrelevant, and we can look at existing nightly CI to see how #728 is doing (it was built in [1]).  e2e-aws-serial [2]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial/1483666868551028736/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-79bd96bf66-jhxpm_cluster-version-operator.log | grep 'Running sync.*in state\|Detected while\|Ignoring detected version\|Propagating initial target version'
I0119 05:30:50.439864       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.nightly-2022-01-19-050236 registry.build04.ci.openshift.org/ci-op-0h0k3h7d/release@sha256:a7fd0f26d9a6bc21f1e749bf17b261cd479ea19354d0bda86673c732b53a9b6c false} to sync worker loop in state Initializing.
I0119 05:30:50.440381       1 sync_worker.go:507] Detected while calculating next work: version change (from {  false} to {4.10.0-0.nightly-2022-01-19-050236 registry.build04.ci.openshift.org/ci-op-0h0k3h7d/release@sha256:a7fd0f26d9a6bc21f1e749bf17b261cd479ea19354d0bda86673c732b53a9b6c false})
I0119 05:30:50.440433       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 0
I0119 05:33:37.206016       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 1
I0119 05:36:41.726962       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 2
I0119 05:40:33.255167       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 3
I0119 05:44:46.367113       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Reconciling at attempt 0
...

That looks good to me, and apparently nothing touched overrides while that pod was running.  But the installer's overrides touches are on the bootstrap node, so let's see if we can dig those out of Loki for this run.  Following [2]'s custom-link-tools.html to Loki and running:

  {invoker="openshift-internal-ci/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial/1483666868551028736"} | unpack | container_name="cluster-version-operator" |~ "Running sync.*in state|Detected while|Ignoring detected version|Propagating initial target version"

only gives hits for cluster-version-operator-79bd96bf66-jhxpm, so I guess no access to the static bootstrap pod.

Luckily (ish) console-aws [3] failed during bootstrapping on bug 2040533.  Looking at the gathered log bundle [4]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-console-aws/1483666878021767168/artifacts/console-aws/ipi-install-install/artifacts/log-bundle-20220119054933.tar >log-bundle.tar.gz
$ tar tvz <log-bundle.tar.gz | grep bootkube
-rw------- 1003220000/root  115809 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/journals/bootkube.log
-rw------- 1003220000/root    3397 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/services/bootkube.json
$ tar xOz log-bundle-20220119054933/bootstrap/journals/bootkube.log <log-bundle.tar.gz | grep -A1 overrides | tail -n2
Jan 19 05:25:56 ip-10-0-41-215 bootkube.sh[2338]: Sending bootstrap-finished event.Restoring CVO overrides
Jan 19 05:26:21 ip-10-0-41-215 bootkube.sh[2338]: clusterversion.config.openshift.io/version patched

So the overrides were cleared at 05:26:21.  Checking the overrides that existed before:

$ tar tvz <log-bundle.tar.gz | grep 'rendered-assets.*override'
-rw------- 1003220000/root       0 2022-01-18 21:50 log-bundle-20220119054933/rendered-assets/openshift/cvo-overrides.done
-rw------- 1003220000/root    2098 2022-01-18 21:50 log-bundle-20220119054933/rendered-assets/openshift/manifests/cvo-overrides.yaml
-rw------- 1003220000/root      27 2022-01-18 21:50 log-bundle-20220119054933/rendered-assets/openshift/original_cvo_overrides.patch
$ tar xOz log-bundle-20220119054933/rendered-assets/openshift/manifests/cvo-overrides.yaml <log-bundle.tar.gz
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  name: version
  namespace: openshift-cluster-version
spec:
  clusterID: e0fabcfa-c14f-4a29-b122-cc7dc7a03563
  overrides:
  - group: ""
    kind: ConfigMap
    name: cluster-config-v1
    namespace: kube-system
    unmanaged: true
  - group: config.openshift.io
    kind: DNS
    name: cluster
    namespace: ""
    unmanaged: true
...
$ tar xOz log-bundle-20220119054933/rendered-assets/openshift/original_cvo_overrides.patch <log-bundle.tar.gz
{"spec":{"overrides":null}}

Checking CRI-O for details about the bootstrap CVO pod:

$ tar tvz <log-bundle.tar.gz | grep 'bootstrap.*crio.log'
-rw------- 1003220000/root   95625 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/journals/crio.log
$ tar xOz log-bundle-20220119054933/bootstrap/journals/crio.log <log-bundle.tar.gz | grep cluster-version-operator
Jan 19 05:18:55 ip-10-0-41-215 crio[2039]: time="2022-01-19 05:18:55.020873287Z" level=info msg="Running pod sandbox: openshift-cluster-version/bootstrap-cluster-version-operator-ip-10-0-41-215/POD" id=0b545baa-a393-4d9e-ab36-90ea6b4afc1f name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Jan 19 05:18:55 ip-10-0-41-215 crio[2039]: time="2022-01-19 05:18:55.040700182Z" level=info msg="Ran pod sandbox 493e6975cb94bc535f599b901ec247b3bd872b92060fe87cb882acd2cd2dbae4 with infra container: openshift-cluster-version/bootstrap-cluster-version-operator-ip-10-0-41-215/POD" id=0b545baa-a393-4d9e-ab36-90ea6b4afc1f name=/runtime.v1alpha2.RuntimeService/RunPodSandbox
Jan 19 05:18:55 ip-10-0-41-215 crio[2039]: time="2022-01-19 05:18:55.399877167Z" level=info msg="Creating container: openshift-cluster-version/bootstrap-cluster-version-operator-ip-10-0-41-215/cluster-version-operator" id=b4eb70ba-8878-4857-9ef1-df610ddff4cf name=/runtime.v1alpha2.RuntimeService/CreateContainer
Jan 19 05:18:55 ip-10-0-41-215 crio[2039]: time="2022-01-19 05:18:55.464250669Z" level=info msg="Created container b179297c2037711f22dc9c2e765efb99e44186f9a92a0b4484ddd0d5cb034630: openshift-cluster-version/bootstrap-cluster-version-operator-ip-10-0-41-215/cluster-version-operator" id=b4eb70ba-8878-4857-9ef1-df610ddff4cf name=/runtime.v1alpha2.RuntimeService/CreateContainer
Jan 19 05:18:55 ip-10-0-41-215 crio[2039]: time="2022-01-19 05:18:55.473935725Z" level=info msg="Started container" PID=6988 containerID=b179297c2037711f22dc9c2e765efb99e44186f9a92a0b4484ddd0d5cb034630 description=openshift-cluster-version/bootstrap-cluster-version-operator-ip-10-0-41-215/cluster-version-operator id=24f4a512-0b1a-4806-b116-f3b55b8e5d81 name=/runtime.v1alpha2.RuntimeService/StartContainer sandboxID=493e6975cb94bc535f599b901ec247b3bd872b92060fe87cb882acd2cd2dbae4
Jan 19 05:26:22 ip-10-0-41-215 crio[2039]: time="2022-01-19 05:26:22.604051678Z" level=info msg="Stopped container b179297c2037711f22dc9c2e765efb99e44186f9a92a0b4484ddd0d5cb034630: openshift-cluster-version/bootstrap-cluster-version-operator-ip-10-0-41-215/cluster-version-operator" id=ced43978-f3e1-4335-b816-16754f916234 name=/runtime.v1alpha2.RuntimeService/StopContainer
Jan 19 05:27:22 ip-10-0-41-215 crio[2039]: time="2022-01-19 05:27:22.358425543Z" level=info msg="Removed container b179297c2037711f22dc9c2e765efb99e44186f9a92a0b4484ddd0d5cb034630: openshift-cluster-version/bootstrap-cluster-version-operator-ip-10-0-41-215/cluster-version-operator" id=e654c8c8-5a6e-45f1-b660-0a80b426558a name=/runtime.v1alpha2.RuntimeService/RemoveContainer

Ok, so the static pod was removed shortly after the overrides were cleared.  Did we gather logs from that static CVO pod?

$ tar tvz <log-bundle.tar.gz | grep bootstrap/containers
drwxr-xr-x 1003220000/root   0 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers
drwxr-xr-x 1003220000/root   0 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/bootstrap-control-plane
-rw------- 1003220000/root 93449 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/bootstrap-control-plane/cluster-policy-controller.log
-rw------- 1003220000/root 1435023 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log
-rw------- 1003220000/root  930242 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/bootstrap-control-plane/kube-controller-manager.log
-rw------- 1003220000/root   97390 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/bootstrap-control-plane/kube-scheduler.log
-rw------- 1003220000/root   45868 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/etcd-4799c9eb4370c2bd4e561f8f8bae4c50a8304923f8cf9d03737fbf73fd4077dc.inspect
-rw------- 1003220000/root  841840 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/etcd-4799c9eb4370c2bd4e561f8f8bae4c50a8304923f8cf9d03737fbf73fd4077dc.log
-rw------- 1003220000/root   43520 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/etcdctl-cd9bdd7b98ef5f3f516c3ce6e996c4cdf4f27cecc3c125d212faf295d12deeb4.inspect
-rw------- 1003220000/root       0 2022-01-18 21:50 log-bundle-20220119054933/bootstrap/containers/etcdctl-cd9bdd7b98ef5f3f516c3ce6e996c4cdf4f27cecc3c125d212faf295d12deeb4.log

Nope.  Ah well.  There were clearly overrides.  The overrides were cleared.  Ah, I guess we can look at the control-plane CVO logs to confirm that the clearance was active:

$ tar tvz <log-bundle.tar.gz | grep 'cluster-version-operator.*log'
-rw------- 1003220000/root 2123471 2022-01-18 21:50 log-bundle-20220119054933/control-plane/10.0.216.199/containers/cluster-version-operator-7f580d7bcd19a0c9d9d6e02141679fc765d4b5e911c7f1bb3050912fed839922.log
$ tar xOz log-bundle-20220119054933/control-plane/10.0.216.199/containers/cluster-version-operator-7f580d7bcd19a0c9d9d6e02141679fc765d4b5e911c7f1bb3050912fed839922.log <log-bundle.tar.gz | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version'
I0119 05:26:33.055769       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.nightly-2022-01-19-050236 registry.build04.ci.openshift.org/ci-op-k7bw6ht7/release@sha256:a7fd0f26d9a6bc21f1e749bf17b261cd479ea19354d0bda86673c732b53a9b6c false} to sync worker loop in state Initializing.
I0119 05:26:33.056453       1 sync_worker.go:507] Detected while calculating next work: version change (from {  false} to {4.10.0-0.nightly-2022-01-19-050236 registry.build04.ci.openshift.org/ci-op-k7bw6ht7/release@sha256:a7fd0f26d9a6bc21f1e749bf17b261cd479ea19354d0bda86673c732b53a9b6c false})
I0119 05:26:33.056512       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 0
I0119 05:26:40.138413       1 sync_worker.go:768] Running sync for dns "cluster" (23 of 769)
I0119 05:29:15.098599       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 1
I0119 05:29:18.903251       1 sync_worker.go:768] Running sync for dns "cluster" (23 of 769)
I0119 05:32:16.364124       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 2
I0119 05:32:22.664397       1 sync_worker.go:768] Running sync for dns "cluster" (23 of 769)
I0119 05:35:51.226422       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Initializing at attempt 3
I0119 05:35:57.484032       1 sync_worker.go:768] Running sync for dns "cluster" (23 of 769)
I0119 05:38:53.968561       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Reconciling at attempt 0
I0119 05:38:54.187795       1 sync_worker.go:768] Running sync for dns "cluster" (23 of 769)
I0119 05:46:25.927309       1 sync_worker.go:555] Running sync 4.10.0-0.nightly-2022-01-19-050236 (force=false) on generation 2 in state Reconciling at attempt 1
I0119 05:46:53.449977       1 sync_worker.go:768] Running sync for dns "cluster" (23 of 769)

So it's clear that the cluster DNS override did get cleared and the CVO began reconciling it while staying in Initializing, which is good.  So all signs point to things working well for these overrides-touching installs, but it would be nice to have the static bootstrap CVO pod logs to confirm things were working as expected.  Probably not worth blocking this bug's verification on access to logs like that, though.

[1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2022-01-19-050236
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial/1483666868551028736
[3]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-console-aws/1483666878021767168
[4]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-console-aws/1483666878021767168/artifacts/console-aws/ipi-install-install/artifacts/

Comment 14 liujia 2022-01-20 10:07:50 UTC

Hi, Trevor

I want to confirm the verify steps about the bug, especially for the new prs. 

For pr #729, should i follow https://bugzilla.redhat.com/show_bug.cgi?id=2009845#c10 to verify it? It seems a hacked way for the verification from logs, right? I don't think we will support such scenario that allowing overrides or oc adm upgrade during installation. 

For pr #728, it is for the original issue which was fixed in #713 earlier. Should i verify it the same way with https://bugzilla.redhat.com/show_bug.cgi?id=2009845#c7 and check if regression on installation failures does not happen, or follow the steps you recorded in https://bugzilla.redhat.com/show_bug.cgi?id=2009845#c11?

Comment 16 W. Trevor King 2022-01-20 22:31:29 UTC

(In reply to liujia from comment #14)
> I don't think we will support such scenario that allowing overrides or oc adm upgrade during installation. 

It's not install-specific, we don't support overrides at all [1]:

  Setting a CVO override puts the entire cluster in an unsupported state. Reported issues must be reproduced after removing any overrides for support to proceed.

But, since bug 2008119 [2], the installer will set overrides for some fraction of bootstrapping, and we support those installs.  You can probably supply some manifests to the installer and watch the installer's injected overrides to confirm that the CVO's adjusting overrides-handling is working without having to inject your own overrides, but the timing there would be pretty tight, because the bootstrap CVO will exit very shortly after the installer clears those overrides.  Maybe you could get at it by streaming down the bootstrap CVO's logs?

I don't think we explicitly talk about folks asking for updates mid-install in docs, probably because it seems like it might fall under common sense "this is probably a bad idea".  So while I expect us to gracefully ignore these mid-install retarget requests, I don't have a problem with us deciding that its so unsupported that it's not worth QEing.

> For pr #728, it is for the original issue which was fixed in #713 earlier. Should i verify it the same way with https://bugzilla.redhat.com/show_bug.cgi?id=2009845#c7 and check if regression on installation failures does not happen, or follow the steps you recorded in https://bugzilla.redhat.com/show_bug.cgi?id=2009845#c11?

I'm still a bit fuzzy on why #713 played poorly in CI.  The fact #728 recovered CI confirms that there was a problem with #713 hotlooping on overrides-change detection.  The fact that #728 doesn't seem to have caused CI trouble is good sign that we actually did fix that hotlooping.  #729 isn't exercised by CI.  So I'm fine with a repeat of the earlier verification steps, or something like comment 11 that pulls down bootstrap-CVO logs, either way.

[1]: https://docs.openshift.com/container-platform/4.9/updating/understanding-the-update-service.html#unmanaged-operators_understanding-the-update-service
[2]: https://github.com/openshift/installer/pull/5258

Comment 17 liujia 2022-01-21 08:32:20 UTC

Version: 4.10.0-0.nightly-2022-01-20-082726

1. Trigger a cluster with cv.overrides setting in manifest before the installation.
…
    overrides:
    - group: imageregistry.operator.openshift.io
      kind: Config
      name: cluster
      namespace: '’
      unmanaged: true
 ….

2.ssh into the bootstrap vm to check the cvo behavior during the installation stage.
# cat /var/opt/openshift/original_cvo_overrides.patch
{"spec":{"overrides":[{"group":"imageregistry.operator.openshift.io","kind":"Config","name":"cluster","namespace":"","unmanaged":true}]}}

//It proves the cv.overrides is expected.

# cat /var/opt/openshift/manifests/cvo-overrides.yaml|grep -A4 "DNS"
    kind: DNS
    name: cluster
    namespace: ""
    unmanaged: true
  - group: config.openshift.io
//from above we can see DNS/cluster should be from installer-provided overrides.

# journalctl -u bootkube.service |grep "overrides\|patched"
Jan 21 07:09:02 jliu-410-8wrv8-bootstrap.c.openshift-qe.internal bootkube.sh[2503]: Created "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version
Jan 21 07:14:41 jliu-410-8wrv8-bootstrap.c.openshift-qe.internal bootkube.sh[2503]: Skipped "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version as it already exists
Jan 21 07:14:46 jliu-410-8wrv8-bootstrap.c.openshift-qe.internal bootkube.sh[2503]: Sending bootstrap-finished event.Restoring CVO overrides
Jan 21 07:16:02 jliu-410-8wrv8-bootstrap.c.openshift-qe.internal bootkube.sh[2503]: clusterversion.config.openshift.io/version patched

//Bootkube said it patched cv at 07:16:02(cluster-bootstrap complete and cleared the cvo-overrides from installer), but from the container log of cvo pod on bootstrap vm, we see before the patch(07:16:02), cvo was still running sync for resources in cvo-overrides. 
? I thought the overrides should be respected before the cv restored to the patched one. But if seems not. 

$ sudo crictl logs 75234b8d0a708 2>&1|grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster'
I0121 07:09:02.705099       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.nightly-2022-01-20-082726 registry.ci.openshift.org/ocp/release@sha256:bdc27b9ff4a1a482d00fc08924f1157d782ded9f3e91af09fe9f3596bcea877c false} to sync worker loop in state Initializing.
I0121 07:09:02.706104       1 sync_worker.go:510] Detected while calculating next work: version (from {  false} to {4.10.0-0.nightly-2022-01-20-082726 registry.ci.openshift.org/ocp/release@sha256:bdc27b9ff4a1a482d00fc08924f1157d782ded9f3e91af09fe9f3596bcea877c false}) and overrides changes
I0121 07:09:02.706166       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 1 in state Initializing at attempt 0
I0121 07:09:06.387267       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:09:21.034533       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:11:38.447158       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 1 in state Initializing at attempt 1
I0121 07:11:39.243413       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:11:45.743278       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:14:30.519224       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 1 in state Initializing at attempt 2
I0121 07:14:31.876562       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)

3. After the installation finish, check the final cvo log(not from bootstrap vm) through oc. 
? There is still syncing for the overridden resource Config/cluster.

# ./oc logs cluster-version-operator-674c64b446-hfh7q|grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster'
I0121 07:16:07.192798       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.nightly-2022-01-20-082726 registry.ci.openshift.org/ocp/release@sha256:bdc27b9ff4a1a482d00fc08924f1157d782ded9f3e91af09fe9f3596bcea877c false} to sync worker loop in state Initializing.
I0121 07:16:07.193745       1 sync_worker.go:510] Detected while calculating next work: version (from {  false} to {4.10.0-0.nightly-2022-01-20-082726 registry.ci.openshift.org/ocp/release@sha256:bdc27b9ff4a1a482d00fc08924f1157d782ded9f3e91af09fe9f3596bcea877c false}) and overrides changes
I0121 07:16:07.193790       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 2 in state Initializing at attempt 0
I0121 07:16:09.825347       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:16:15.451087       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:20:14.316656       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 2 in state Initializing at attempt 1
I0121 07:20:16.624234       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:20:21.099399       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:24:50.373206       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 2 in state Initializing at attempt 2
I0121 07:24:51.829592       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:24:56.554058       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:30:17.812583       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 2 in state Initializing at attempt 3
I0121 07:30:19.752220       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:30:24.176248       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:34:13.980893       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 2 in state Reconciling at attempt 0
I0121 07:34:14.177649       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:34:24.537535       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:38:20.102591       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 2 in state Reconciling at attempt 0
I0121 07:38:20.303590       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0121 07:38:30.709982       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0121 07:42:26.234541       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-20-082726 (force=false) on generation 2 in state Reconciling at attempt 0
...

Based on the above check, it looks unexpected. 

Trevor, could you help correct my verify?

Comment 18 W. Trevor King 2022-01-24 21:44:00 UTC

(In reply to liujia from comment #17)
> I0121 07:09:06.387267       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)

Config "cluster" isn't very specific without the group information, which the CVO doesn't log today.  See bug 2022509 for some of the CVO manifests that have a 'Config' kind, a 'cluster' name, and assorted groups.

> I0121 07:09:21.034533       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)

This one is more surprising to me.  Checking the source [1], I see that we are actually still logging that, but then following up with "Skipping ... as unmanaged".  Can we try again with:

  $ ... | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged'

And possibly with an override that's not a Config cluster entry?  Maybe:

  overrides:
  - group: monitoring.coreos.com
    kind: PrometheusRule
    namespace: openshift-machine-api
    name: machine-api-operator-prometheus-rules
    unmanaged: true

or similar?

[1]: https://github.com/openshift/cluster-version-operator/blob/404b1970876b64f7adec4e38710c69594e42e317/pkg/cvo/sync_worker.go#L771-L775

Comment 19 Yang Yang 2022-01-25 12:32:46 UTC

Attempting with an override on 4.10.0-0.nightly-2022-01-24-070025

  overrides:
  - group: monitoring.coreos.com
    kind: PrometheusRule
    namespace: openshift-machine-api
    name: machine-api-operator-prometheus-rules
    unmanaged: true

[core@yanyang0125e-lcfd7-bootstrap ~]$ cat /var/opt/openshift/original_cvo_overrides.patch | jq
{
  "spec": {
    "overrides": [
      {
        "group": "monitoring.coreos.com",
        "kind": "PrometheusRule",
        "name": "machine-api-operator-prometheus-rules",
        "namespace": "openshift-machine-api",
        "unmanaged": true
      }
    ]
  }
}
[core@yanyang0125e-lcfd7-bootstrap ~]$ cat /var/opt/openshift/manifests/cvo-overrides.yaml
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  name: version
  namespace: openshift-cluster-version
spec:
  channel: stable-4.10
  clusterID: b985ec34-98f9-4e1e-b55e-c8f9f9e54542
  overrides:
  - group: ""
    kind: ConfigMap
    name: ingress-uid
    namespace: kube-system
    unmanaged: true
  - group: ""
    kind: ConfigMap
    name: cloud-provider-config
    namespace: openshift-config
    unmanaged: true
  - group: ""
    kind: ConfigMap
    name: cluster-config-v1
    namespace: kube-system
    unmanaged: true
  - group: config.openshift.io
    kind: DNS
    name: cluster
    namespace: ""
    unmanaged: true
  - group: config.openshift.io
    kind: Infrastructure
    name: cluster
    namespace: ""
    unmanaged: true
  - group: config.openshift.io
    kind: Ingress
    name: cluster
    namespace: ""
    unmanaged: true
  - group: apiextensions.k8s.io
    kind: CustomResourceDefinition
    name: networks.config.openshift.io
    namespace: ""
    unmanaged: true
  - group: config.openshift.io
    kind: Network
    name: cluster
    namespace: ""
    unmanaged: true
  - group: config.openshift.io
    kind: Proxy
    name: cluster
    namespace: ""
    unmanaged: true
  - group: config.openshift.io
    kind: Scheduler
    name: cluster
    namespace: ""
    unmanaged: true
  - group: ""
    kind: Secret
    name: kube-cloud-cfg
    namespace: kube-system
    unmanaged: true
  - group: ""
    kind: ConfigMap
    name: root-ca
    namespace: kube-system
    unmanaged: true
  - group: ""
    kind: Secret
    name: machine-config-server-tls
    namespace: openshift-machine-config-operator
    unmanaged: true
  - group: ""
    kind: Secret
    name: pull-secret
    namespace: openshift-config
    unmanaged: true
  - group: ""
    kind: Secret
    name: gcp-credentials
    namespace: kube-system
    unmanaged: true
  - group: ""
    kind: Secret
    name: kubeadmin
    namespace: kube-system
    unmanaged: true
  - group: rbac.authorization.k8s.io
    kind: Role
    name: gcp-creds-secret-reader
    namespace: kube-system
    unmanaged: true
  - group: ""
    kind: ConfigMap
    name: openshift-install-manifests
    namespace: openshift-config
    unmanaged: true
  - group: monitoring.coreos.com
    kind: PrometheusRule
    name: machine-api-operator-prometheus-rules
    namespace: openshift-machine-api
    unmanaged: true

[core@yanyang0125e-lcfd7-bootstrap ~]$ journalctl -u bootkube.service |grep "overrides\|patched"
Jan 25 11:25:19 yanyang0125e-lcfd7-bootstrap.c.openshift-qe.internal bootkube.sh[2423]: Created "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version
Jan 25 11:34:18 yanyang0125e-lcfd7-bootstrap.c.openshift-qe.internal bootkube.sh[2423]: Skipped "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version as it already exists
Jan 25 11:34:23 yanyang0125e-lcfd7-bootstrap.c.openshift-qe.internal bootkube.sh[2423]: Sending bootstrap-finished event.Restoring CVO overrides
Jan 25 11:35:01 yanyang0125e-lcfd7-bootstrap.c.openshift-qe.internal bootkube.sh[2423]: clusterversion.config.openshift.io/version patched

[core@yanyang0125e-lcfd7-bootstrap ~]$ sudo crictl logs beee526925a7e 2>&1 | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged'
I0125 11:25:19.920164       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.nightly-2022-01-24-070025 registry.ci.openshift.org/ocp/release@sha256:87c1f0019d44107031c662a3c387bba46d1f2710909b583e5f4c0ad61cdba36e false} to sync worker loop in state Initializing.
I0125 11:25:19.920940       1 sync_worker.go:510] Detected while calculating next work: version (from {  false} to {4.10.0-0.nightly-2022-01-24-070025 registry.ci.openshift.org/ocp/release@sha256:87c1f0019d44107031c662a3c387bba46d1f2710909b583e5f4c0ad61cdba36e false}) and overrides changes
I0125 11:25:19.921017       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 1 in state Initializing at attempt 0
I0125 11:25:23.462739       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 11:25:37.633450       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 11:25:37.633465       1 sync_worker.go:775] Skipping dns "cluster" (23 of 769) as unmanaged
I0125 11:25:45.658020       1 sync_worker.go:775] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 769) as unmanaged
I0125 11:25:50.259162       1 sync_worker.go:775] Skipping infrastructure "cluster" (26 of 769) as unmanaged
I0125 11:25:50.259180       1 sync_worker.go:775] Skipping ingress "cluster" (27 of 769) as unmanaged
I0125 11:25:50.259195       1 sync_worker.go:775] Skipping network "cluster" (28 of 769) as unmanaged
I0125 11:25:58.934214       1 sync_worker.go:775] Skipping proxy "cluster" (32 of 769) as unmanaged
I0125 11:25:58.934233       1 sync_worker.go:775] Skipping scheduler "cluster" (33 of 769) as unmanaged
I0125 11:26:03.934560       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
I0125 11:29:46.323880       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 1 in state Initializing at attempt 1
I0125 11:29:46.446552       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 11:29:50.557416       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
I0125 11:29:55.981236       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 11:29:55.981247       1 sync_worker.go:775] Skipping dns "cluster" (23 of 769) as unmanaged
I0125 11:29:57.855967       1 sync_worker.go:775] Skipping infrastructure "cluster" (26 of 769) as unmanaged
I0125 11:29:57.855999       1 sync_worker.go:775] Skipping ingress "cluster" (27 of 769) as unmanaged
I0125 11:29:57.856018       1 sync_worker.go:775] Skipping network "cluster" (28 of 769) as unmanaged
I0125 11:29:58.707702       1 sync_worker.go:775] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 769) as unmanaged
I0125 11:30:02.181421       1 sync_worker.go:775] Skipping proxy "cluster" (32 of 769) as unmanaged
I0125 11:30:02.181437       1 sync_worker.go:775] Skipping scheduler "cluster" (33 of 769) as unmanaged

Looks right to me. Bootstrap CVO skips prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules".

Looking at the CVO log after installation:

# oc -n openshift-cluster-version logs cluster-version-operator-5c6ff84ff4-lqczv | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged'
I0125 11:35:00.432109       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.nightly-2022-01-24-070025 registry.ci.openshift.org/ocp/release@sha256:87c1f0019d44107031c662a3c387bba46d1f2710909b583e5f4c0ad61cdba36e false} to sync worker loop in state Initializing.
I0125 11:35:00.433405       1 sync_worker.go:510] Detected while calculating next work: version (from {  false} to {4.10.0-0.nightly-2022-01-24-070025 registry.ci.openshift.org/ocp/release@sha256:87c1f0019d44107031c662a3c387bba46d1f2710909b583e5f4c0ad61cdba36e false}) and overrides changes
I0125 11:35:00.433568       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 1 in state Initializing at attempt 0
I0125 11:35:01.050566       1 sync_worker.go:510] Detected while considering cluster version generation 2: overrides change ([{ConfigMap  kube-system ingress-uid true} {ConfigMap  openshift-config cloud-provider-config true} {ConfigMap  kube-system cluster-config-v1 true} {DNS config.openshift.io  cluster true} {Infrastructure config.openshift.io  cluster true} {Ingress config.openshift.io  cluster true} {CustomResourceDefinition apiextensions.k8s.io  networks.config.openshift.io true} {Network config.openshift.io  cluster true} {Proxy config.openshift.io  cluster true} {Scheduler config.openshift.io  cluster true} {Secret  kube-system kube-cloud-cfg true} {ConfigMap  kube-system root-ca true} {Secret  openshift-machine-config-operator machine-config-server-tls true} {Secret  openshift-config pull-secret true} {Secret  kube-system gcp-credentials true} {Secret  kube-system kubeadmin true} {Role rbac.authorization.k8s.io kube-system gcp-creds-secret-reader true} {ConfigMap  openshift-config openshift-install-manifests true} {PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}] to [{PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}])
I0125 11:35:01.216362       1 sync_worker.go:510] Detected while calculating next work: overrides change ([{ConfigMap  kube-system ingress-uid true} {ConfigMap  openshift-config cloud-provider-config true} {ConfigMap  kube-system cluster-config-v1 true} {DNS config.openshift.io  cluster true} {Infrastructure config.openshift.io  cluster true} {Ingress config.openshift.io  cluster true} {CustomResourceDefinition apiextensions.k8s.io  networks.config.openshift.io true} {Network config.openshift.io  cluster true} {Proxy config.openshift.io  cluster true} {Scheduler config.openshift.io  cluster true} {Secret  kube-system kube-cloud-cfg true} {ConfigMap  kube-system root-ca true} {Secret  openshift-machine-config-operator machine-config-server-tls true} {Secret  openshift-config pull-secret true} {Secret  kube-system gcp-credentials true} {Secret  kube-system kubeadmin true} {Role rbac.authorization.k8s.io kube-system gcp-creds-secret-reader true} {ConfigMap  openshift-config openshift-install-manifests true} {PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}] to [{PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}])
I0125 11:35:01.216430       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Updating at attempt 0
I0125 11:35:33.509703       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 11:43:25.104619       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Updating at attempt 1
I0125 11:43:28.469841       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 11:48:32.337140       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 11:52:37.232509       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Updating at attempt 2
I0125 11:52:40.290296       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 11:52:47.195681       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 11:56:16.943332       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
I0125 12:00:01.607790       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Reconciling at attempt 0
I0125 12:00:05.719082       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 12:00:24.066865       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 12:00:31.319361       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
I0125 12:04:27.084960       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Reconciling at attempt 0
I0125 12:04:31.144522       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 12:04:45.844369       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 12:04:52.896325       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
I0125 12:08:48.659965       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Reconciling at attempt 0
I0125 12:08:52.717931       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 12:09:07.366991       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 12:09:18.368424       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
I0125 12:13:14.133998       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Reconciling at attempt 0
I0125 12:13:18.194508       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 12:13:32.843531       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 12:13:39.944947       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
I0125 12:17:35.716483       1 sync_worker.go:558] Running sync 4.10.0-0.nightly-2022-01-24-070025 (force=false) on generation 2 in state Reconciling at attempt 0
I0125 12:17:39.775264       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 769)
I0125 12:17:54.425639       1 sync_worker.go:771] Running sync for config "cluster" (155 of 769)
I0125 12:18:01.527160       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged


Well, we can still see sync for dns "cluster" and config "cluster".

Trevor, could you please help confirm if it's expected? Thanks!

Comment 20 W. Trevor King 2022-01-26 00:04:35 UTC

Comment 19's bootstrapping looks good to me, up through the 11:35:01 patch to remove all of the installer-injected overrides.  Logs from cluster-version-operator-5c6ff84ff4-lqczv successfully detect the overrides change and begin syncing everything except the user-injected PrometheusRule override, which is good because that one wasn't cleared.

But I don't understand:

  Running sync ... on generation 2 in state Updating at attempt 0

after the overrides change.  We want to stay in state Initializing, per bug 1905221.  I've opened [1] to poke around.

[1]: https://github.com/openshift/cluster-version-operator/pull/730

Comment 21 Yang Yang 2022-01-26 03:57:19 UTC

Thanks Trevor for helping me understand it. I'm re-opening it at the moment. Please revert it back to ON_QA if you'd like to address the issue in comment#20 separately. Thanks.

Comment 22 Yang Yang 2022-01-26 08:17:35 UTC

# oc get clusterversion -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2022-01-26T06:44:25Z"
    generation: 2
    name: version
    resourceVersion: "26635"
    uid: e6e23de2-6358-48be-bd00-b7fc3a3a8bf7
  spec:
    channel: stable-4.10
    clusterID: 2704b169-c64b-4534-ad83-5a53edb4d163
    overrides:
    - group: monitoring.coreos.com
      kind: PrometheusRule
      name: machine-api-operator-prometheus-rules
      namespace: openshift-machine-api
      unmanaged: true
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2022-01-26T06:44:25Z"
      message: Disabling ownership via cluster version overrides prevents upgrades.
        Please remove overrides before continuing.
      reason: ClusterVersionOverridesSet
      status: "False"
      type: Upgradeable
    - lastTransitionTime: "2022-01-26T07:02:28Z"
      message: Done applying 4.10.0-0.nightly-2022-01-25-023600
      status: "True"
      type: Available
    - lastTransitionTime: "2022-01-26T07:00:57Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2022-01-26T07:02:28Z"
      message: Cluster version is 4.10.0-0.nightly-2022-01-25-023600
      status: "False"
      type: Progressing
    - lastTransitionTime: "2022-01-26T06:44:25Z"
      message: 'Unable to retrieve available updates: currently reconciling cluster
        version 4.10.0-0.nightly-2022-01-25-023600 not found in the "stable-4.10"
        channel'
      reason: VersionNotFound
      status: "False"
      type: RetrievedUpdates
    desired:
      image: registry.ci.openshift.org/ocp/release@sha256:19fd4b9a313f2dfcdc982f088cffcc5859484af3cb8966cde5ec0be90b262dbc
      version: 4.10.0-0.nightly-2022-01-25-023600
    history:
    - completionTime: "2022-01-26T07:02:28Z"
      image: registry.ci.openshift.org/ocp/release@sha256:19fd4b9a313f2dfcdc982f088cffcc5859484af3cb8966cde5ec0be90b262dbc
      startedTime: "2022-01-26T06:44:25Z"
      state: Completed
      verified: false
      version: 4.10.0-0.nightly-2022-01-25-023600
    observedGeneration: 2
    versionHash: 1rTvIGx_lXw=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 24 W. Trevor King 2022-01-26 08:35:55 UTC

Looking in Loki for non-update jobs that still had a CVO transition to Updating:

  {invoker!~"|.*upgrade.*"} | unpack | container_name="cluster-version-operator" |~ "Running sync .* in state Updating at attempt 0"

Turned up an ...-e2e-aws-serial/1486077917686403072 invoker (among a handful of others).  Finding the run-detail page from that job while Prow still remembers it:

  $ curl -s https://prow.ci.openshift.org/prowjobs.js | jq . | grep 'https://.*1486077917686403072'
          "url": "https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/851/pull-ci-openshift-oc-master-e2e-aws-serial/1486077917686403072",

And it does indeed look like it reproduces the comment 19 transition to Updating:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_oc/851/pull-ci-openshift-oc-master-e2e-aws-serial/1486077917686403072/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-6f8f6947c4-pfqqz_cluster-version-operator.log | grep 'ClusterVersionOperator\|became leader\|Running sync.*in state\|Result of work\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged'
I0125 21:16:07.740960       1 start.go:23] ClusterVersionOperator v1.0.0-752-g037f1986-dirty
I0125 21:19:07.639366       1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-cluster-version", Name:"version", UID:"5f934228-6ba4-4843-8abe-4f9a2c2dfb96", APIVersion:"v1", ResourceVersion:"14083", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-0-177-211_32d9affd-d048-4d50-a5f5-16778218d71a became leader
I0125 21:19:07.639536       1 cvo.go:338] Starting ClusterVersionOperator with minimum reconcile period 2m42.77595144s
I0125 21:19:07.640196       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.ci.test-2022-01-25-205921-ci-op-42ntw0i2-latest registry.build02.ci.openshift.org/ci-op-42ntw0i2/release@sha256:8929f52954b25a8ce26c4fb62a9268b6fbf3d11b8676887b9fe908ae7930c318 false} to sync worker loop in state Initializing.
I0125 21:19:07.641298       1 sync_worker.go:510] Detected while calculating next work: version (from {  false} to {4.10.0-0.ci.test-2022-01-25-205921-ci-op-42ntw0i2-latest registry.build02.ci.openshift.org/ci-op-42ntw0i2/release@sha256:8929f52954b25a8ce26c4fb62a9268b6fbf3d11b8676887b9fe908ae7930c318 false}) and overrides changes
I0125 21:19:07.641364       1 sync_worker.go:558] Running sync 4.10.0-0.ci.test-2022-01-25-205921-ci-op-42ntw0i2-latest (force=false) on generation 1 in state Initializing at attempt 0
I0125 21:19:08.201327       1 sync_worker.go:771] Running sync for config "cluster" (155 of 770)
I0125 21:19:14.772127       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 770)
I0125 21:19:14.772134       1 sync_worker.go:775] Skipping dns "cluster" (23 of 770) as unmanaged
I0125 21:19:16.800720       1 sync_worker.go:775] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0125 21:19:16.800728       1 sync_worker.go:775] Skipping ingress "cluster" (27 of 770) as unmanaged
I0125 21:19:16.800737       1 sync_worker.go:775] Skipping network "cluster" (28 of 770) as unmanaged
I0125 21:19:17.417768       1 sync_worker.go:510] Detected while considering cluster version generation 2: overrides change ([...blah blah lots... to [])
I0125 21:19:17.457437       1 task_graph.go:546] Result of work: [Could not update oauth "cluster" (29 of 770) ...blah blah lots...]
I0125 21:19:17.458850       1 sync_worker.go:510] Detected while calculating next work: overrides change ([...blah blah lots...] to [])
I0125 21:19:17.458933       1 sync_worker.go:558] Running sync 4.10.0-0.ci.test-2022-01-25-205921-ci-op-42ntw0i2-latest (force=false) on generation 2 in state Updating at attempt 0
I0125 21:19:18.378231       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 770)
I0125 21:24:43.012398       1 task_graph.go:546] Result of work: [Cluster operator kube-apiserver is not available]
I0125 21:25:24.602709       1 sync_worker.go:558] Running sync 4.10.0-0.ci.test-2022-01-25-205921-ci-op-42ntw0i2-latest (force=false) on generation 2 in state Updating at attempt 1
I0125 21:25:24.692595       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 770)
I0125 21:30:21.270700       1 sync_worker.go:771] Running sync for config "cluster" (155 of 770)
I0125 21:30:50.155119       1 task_graph.go:546] Result of work: [Cluster operator authentication is updating versions Cluster operator console is not available]
I0125 21:32:23.824640       1 sync_worker.go:558] Running sync 4.10.0-0.ci.test-2022-01-25-205921-ci-op-42ntw0i2-latest (force=false) on generation 2 in state Updating at attempt 2
I0125 21:32:23.946566       1 sync_worker.go:771] Running sync for dns "cluster" (23 of 770)
I0125 21:32:28.329871       1 sync_worker.go:771] Running sync for config "cluster" (155 of 770)
I0125 21:32:54.329864       1 task_graph.go:546] Result of work: []
I0125 21:35:37.109595       1 sync_worker.go:558] Running sync 4.10.0-0.ci.test-2022-01-25-205921-ci-op-42ntw0i2-latest (force=false) on generation 2 in state Reconciling at attempt 0
...

I still don't understand why it's making that transition though...

Comment 25 Yang Yang 2022-01-27 12:07:14 UTC

Logging the test result for #729. The cluster starts from 4.10.0-0.nightly-2022-01-25-023600 and upgrades to 4.10.0-0.nightly-2022-01-26-234447

***When cluster is bootstrapping, upgrade it

# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d --allow-explicit-upgrade --force
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
error: the cluster is already upgrading:

  Reason: MultipleErrors
  Message: Unable to apply 4.10.0-0.nightly-2022-01-25-023600: an unknown error has occurred: MultipleErrors



If you want to upgrade anyway, use --allow-upgrade-with-warnings.

***Adding --allow-upgrade-with-warnings to the oc adm upgrade command line.

# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d --allow-explicit-upgrade --force --allow-upgrade-with-warnings
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
warning: --allow-upgrade-with-warnings is bypassing: the cluster is already upgrading:

  Reason: 
  Message: Working towards 4.10.0-0.nightly-2022-01-25-023600: 637 of 769 done (82% complete)

Updating to release image registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d


# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          6m24s   Working towards 4.10.0-0.nightly-2022-01-25-023600: 643 of 769 done (83% complete)

# oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 | grep 'Running sync.*in state'
I0127 11:35:53.245929       1 sync_worker.go:558] Running sync registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d (force=true) on generation 3 in state Initializing at attempt 0

...

# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          14m     Unable to apply 4.10.0-0.nightly-2022-01-26-234447: some cluster operators have not yet rolled out

# oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 | grep 'Running sync.*in state'
I0127 11:36:29.968216       1 sync_worker.go:558] Running sync registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d (force=true) on generation 3 in state Initializing at attempt 0
I0127 11:37:46.119764       1 sync_worker.go:558] Running sync registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d (force=true) on generation 4 in state Updating at attempt 0

# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-26-234447   True        False         5m44s   Cluster version is 4.10.0-0.nightly-2022-01-26-234447

# oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 | grep 'Running sync.*in state'
I0127 11:52:55.748789       1 sync_worker.go:558] Running sync registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d (force=true) on generation 4 in state Initializing at attempt 0
I0127 11:55:41.909815       1 sync_worker.go:558] Running sync registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d (force=true) on generation 4 in state Reconciling at attempt 0
I0127 11:59:22.383694       1 sync_worker.go:558] Running sync registry.ci.openshift.org/ocp/release@sha256:694c18617c3ba2d8b9fa2cfd5ddf19919603f6179fe630ccf1a5f25b84d74d1d (force=true) on generation 4 in state Reconciling at attempt 0


Well, after bootstrapping completes, cluster installation goes to cluster initialization stage. At the moment the cluster is upgrading to the 4.10.0-0.nightly-2022-01-26-234447.

Comment 29 Yang Yang 2022-01-28 04:12:05 UTC

Attempting with 4.11.0-0.nightly-2022-01-28-013835 for #728

$ journalctl -u bootkube.service |grep "overrides\|patched"
Jan 28 03:01:44 yanyang0128a-d6vjr-bootstrap.c.openshift-qe.internal bootkube.sh[2405]: Created "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version
Jan 28 03:10:32 yanyang0128a-d6vjr-bootstrap.c.openshift-qe.internal bootkube.sh[2405]: Skipped "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version as it already exists
Jan 28 03:10:37 yanyang0128a-d6vjr-bootstrap.c.openshift-qe.internal bootkube.sh[2405]: Sending bootstrap-finished event.Restoring CVO overrides

$ sudo crictl logs c53539a62c0f1 2>&1 | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged'
I0128 03:01:44.129776       1 sync_worker.go:255] Propagating initial target version {4.11.0-0.nightly-2022-01-28-013835 registry.ci.openshift.org/ocp/release@sha256:5b00033968582c10546c1c3c1b6b3fc7836e088da36570b19f9a1f38555594c2 false} to sync worker loop in state Initializing.
I0128 03:01:44.130076       1 sync_worker.go:511] Detected while calculating next work: version (from {  false} to {4.11.0-0.nightly-2022-01-28-013835 registry.ci.openshift.org/ocp/release@sha256:5b00033968582c10546c1c3c1b6b3fc7836e088da36570b19f9a1f38555594c2 false}) and overrides changes
I0128 03:01:44.130128       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 1 in state Initializing at attempt 0
I0128 03:01:47.595537       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:02:05.749177       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:02:05.749187       1 sync_worker.go:776] Skipping dns "cluster" (23 of 770) as unmanaged
I0128 03:02:10.774614       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0128 03:02:14.041505       1 sync_worker.go:776] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0128 03:02:14.041519       1 sync_worker.go:776] Skipping ingress "cluster" (27 of 770) as unmanaged
I0128 03:02:14.041531       1 sync_worker.go:776] Skipping network "cluster" (28 of 770) as unmanaged
I0128 03:02:19.691541       1 sync_worker.go:776] Skipping proxy "cluster" (32 of 770) as unmanaged
I0128 03:02:19.691557       1 sync_worker.go:776] Skipping scheduler "cluster" (33 of 770) as unmanaged
I0128 03:02:24.808726       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:05:01.601783       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 1 in state Initializing at attempt 1
I0128 03:05:02.435896       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:05:04.886746       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:05:10.135932       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:05:10.135942       1 sync_worker.go:776] Skipping dns "cluster" (23 of 770) as unmanaged
I0128 03:05:14.812545       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0128 03:05:15.160240       1 sync_worker.go:776] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0128 03:05:15.160260       1 sync_worker.go:776] Skipping ingress "cluster" (27 of 770) as unmanaged
I0128 03:05:15.160278       1 sync_worker.go:776] Skipping network "cluster" (28 of 770) as unmanaged
I0128 03:05:17.010804       1 sync_worker.go:776] Skipping proxy "cluster" (32 of 770) as unmanaged
I0128 03:05:17.010819       1 sync_worker.go:776] Skipping scheduler "cluster" (33 of 770) as unmanaged
I0128 03:08:45.839430       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 1 in state Initializing at attempt 2
I0128 03:08:47.048941       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:08:49.674497       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:08:55.873266       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:08:55.873279       1 sync_worker.go:776] Skipping dns "cluster" (23 of 770) as unmanaged
I0128 03:08:58.650169       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0128 03:09:09.787946       1 sync_worker.go:776] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0128 03:09:09.787967       1 sync_worker.go:776] Skipping ingress "cluster" (27 of 770) as unmanaged
I0128 03:09:09.787990       1 sync_worker.go:776] Skipping network "cluster" (28 of 770) as unmanaged
I0128 03:09:09.839780       1 sync_worker.go:776] Skipping proxy "cluster" (32 of 770) as unmanaged
I0128 03:09:09.841401       1 sync_worker.go:776] Skipping scheduler "cluster" (33 of 770) as unmanaged

# oc -n openshift-cluster-version logs cluster-version-operator-7f585665db-vd4nv | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged\|Result of work'
I0128 03:19:50.073278       1 sync_worker.go:255] Propagating initial target version {4.11.0-0.nightly-2022-01-28-013835 registry.ci.openshift.org/ocp/release@sha256:5b00033968582c10546c1c3c1b6b3fc7836e088da36570b19f9a1f38555594c2 false} to sync worker loop in state Initializing.
I0128 03:19:50.073545       1 sync_worker.go:511] Detected while calculating next work: version (from {  false} to {4.11.0-0.nightly-2022-01-28-013835 registry.ci.openshift.org/ocp/release@sha256:5b00033968582c10546c1c3c1b6b3fc7836e088da36570b19f9a1f38555594c2 false}) and overrides changes
I0128 03:19:50.073694       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Initializing at attempt 0
I0128 03:19:51.813710       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:19:59.699347       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:20:05.191465       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:23:48.449944       1 task_graph.go:546] Result of work: [Cluster operator authentication is not available Cluster operator monitoring is not available]
I0128 03:24:21.659487       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Initializing at attempt 1
I0128 03:24:23.192583       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:24:25.518616       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:24:27.869683       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:26:51.460456       1 task_graph.go:546] Result of work: []
I0128 03:30:49.824637       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 03:30:50.263254       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:31:00.385524       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:31:05.702367       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:31:18.635183       1 task_graph.go:546] Result of work: []
I0128 03:35:16.999976       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 03:35:17.206185       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:35:27.608428       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:35:32.928148       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:35:45.861166       1 task_graph.go:546] Result of work: []
I0128 03:39:44.226120       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 03:39:44.462258       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:39:54.785176       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:40:00.109708       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:40:13.013744       1 task_graph.go:546] Result of work: []
I0128 03:44:11.378505       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 03:44:11.614544       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:44:21.987465       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:44:27.311034       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:44:40.165856       1 task_graph.go:546] Result of work: []
I0128 03:48:38.530995       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 03:48:38.779480       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:48:49.138897       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:48:54.460949       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:49:07.287722       1 task_graph.go:546] Result of work: []
I0128 03:53:05.651942       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 03:53:05.909901       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:53:16.211026       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:53:21.526338       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:53:34.396584       1 task_graph.go:546] Result of work: []
I0128 03:57:32.762925       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 03:57:32.987048       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 03:57:43.369621       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 03:57:48.692071       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 03:58:01.497199       1 task_graph.go:546] Result of work: []
I0128 04:01:59.861504       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 04:02:00.107528       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 04:02:10.469386       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 04:02:15.797134       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 04:02:28.601345       1 task_graph.go:546] Result of work: []
I0128 04:06:26.966923       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-01-28-013835 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 04:06:27.192029       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 04:06:37.574027       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 04:06:42.897887       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 04:06:55.717041       1 task_graph.go:546] Result of work: []


We see the delay between post-bootstrap CVO and bootstrap CVO. "state Updating" is not observed. The installation passed. Cluster installation in comment#19 passed as well.

Comment 30 Yang Yang 2022-01-28 06:28:11 UTC

My bad, I used incorrect build in comment#29.

Re-testing it with 4.10.0-0.nightly-2022-01-28-042103 and observing "state Updating" once more!

# oc -n openshift-cluster-version logs cluster-version-operator-7c678c568b-j2j46 | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged\|Result of work\|Work changed'
I0128 05:23:50.171958       1 sync_worker.go:255] Propagating initial target version {4.10.0-0.nightly-2022-01-28-042103 registry.ci.openshift.org/ocp/release@sha256:2f40c0fb545eb81b2bb4dbc87d381a1ab33beb9b6f42c33b3fb28806f63dea84 false} to sync worker loop in state Initializing.
I0128 05:23:50.181040       1 sync_worker.go:511] Detected while calculating next work: version (from {  false} to {4.10.0-0.nightly-2022-01-28-042103 registry.ci.openshift.org/ocp/release@sha256:2f40c0fb545eb81b2bb4dbc87d381a1ab33beb9b6f42c33b3fb28806f63dea84 false}) and overrides changes
I0128 05:23:50.181148       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 1 in state Initializing at attempt 0
I0128 05:23:53.967907       1 sync_worker.go:511] Detected while considering cluster version generation 2: overrides change ([{ConfigMap  kube-system ingress-uid true} {ConfigMap  openshift-config cloud-provider-config true} {ConfigMap  kube-system cluster-config-v1 true} {DNS config.openshift.io  cluster true} {Infrastructure config.openshift.io  cluster true} {Ingress config.openshift.io  cluster true} {CustomResourceDefinition apiextensions.k8s.io  networks.config.openshift.io true} {Network config.openshift.io  cluster true} {Proxy config.openshift.io  cluster true} {Scheduler config.openshift.io  cluster true} {Secret  kube-system kube-cloud-cfg true} {ConfigMap  kube-system root-ca true} {Secret  openshift-machine-config-operator machine-config-server-tls true} {Secret  openshift-config pull-secret true} {Secret  kube-system gcp-credentials true} {Secret  kube-system kubeadmin true} {Role rbac.authorization.k8s.io kube-system gcp-creds-secret-reader true} {ConfigMap  openshift-config openshift-install-manifests true} {PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}] to [{PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}])
I0128 05:23:54.501144       1 task_graph.go:546] Result of work: [Could not update configmap "openshift-config-managed/admin-gates" (3 of 770) Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (10 of 770) Could not update apiserver "cluster" (19 of 770) Could not update customresourcedefinition "builds.config.openshift.io" (37 of 770) Could not update etcd "cluster" (64 of 770) Could not update clusterrole "system:openshift:scc:anyuid" (75 of 770) Could not update namespace "openshift-kube-controller-manager-operator" (101 of 770) Could not update namespace "openshift-kube-scheduler-operator" (110 of 770) Could not update namespace "openshift-cloud-controller-manager-operator" (119 of 770) Could not update customresourcedefinition "bmceventsubscriptions.metal3.io" (150 of 770) Could not update namespace "openshift-config-operator" (154 of 770) Could not update credentialsrequest "openshift-cloud-credential-operator/openshift-machine-api-aws" (164 of 770) Could not update configmap "openshift-machine-api/cbo-trusted-ca" (208 of 770) Could not update clusterrole "system:openshift:cloud-credential-operator:cluster-reader" (224 of 770) Could not update namespace "openshift-authentication-operator" (241 of 770) Could not update customresourcedefinition "clusterautoscalers.autoscaling.openshift.io" (256 of 770) Could not update customresourcedefinition "csisnapshotcontrollers.operator.openshift.io" (278 of 770) Could not update customresourcedefinition "configs.imageregistry.operator.openshift.io" (296 of 770) Could not update clusterrole "openshift-ingress-operator" (318 of 770) Could not update customresourcedefinition "storageversionmigrations.migration.k8s.io" (335 of 770) Could not update namespace "openshift-cluster-machine-approver" (345 of 770) Could not update customresourcedefinition "alertmanagerconfigs.monitoring.coreos.com" (357 of 770) Could not update credentialsrequest "openshift-cloud-credential-operator/openshift-cloud-network-config-controller-gcp" (378 of 770) Could not update namespace "openshift-cluster-node-tuning-operator" (381 of 770) Could not update namespace "openshift-apiserver-operator" (398 of 770) Could not update namespace "openshift-controller-manager-operator" (410 of 770) Could not update namespace "openshift-cluster-samples-operator" (421 of 770) Could not update namespace "openshift-cluster-csi-drivers" (451 of 770) Could not update priorityclass "openshift-user-critical" (471 of 770) Could not update helmchartrepository "openshift-helm-charts" (472 of 770) Could not update imagestream "openshift/driver-toolkit" (526 of 770) Could not update namespace "openshift-insights" (527 of 770) Could not update customresourcedefinition "catalogsources.operators.coreos.com" (556 of 770) Could not update namespace "openshift-marketplace" (589 of 770) Could not update clusterrolebinding "system:openshift:operator:service-ca-operator" (604 of 770) Could not update namespace "openshift-network-operator" (613 of 770) Could not update clusterrole "openshift-dns-operator" (622 of 770) Could not update clusterrole "system:openshift:machine-config-operator:cluster-reader" (632 of 770) Could not update rolebinding "openshift-cloud-credential-operator/prometheus-k8s" (651 of 770) Could not update role "openshift-authentication/prometheus-k8s" (655 of 770) Could not update clusterrolebinding "registry-monitoring" (665 of 770) Could not update rolebinding "openshift-cluster-machine-approver/prometheus-k8s" (671 of 770) Could not update operatorgroup "openshift-monitoring/openshift-cluster-monitoring" (674 of 770) Could not update rolebinding "openshift-cluster-storage-operator/prometheus" (689 of 770) Could not update role "openshift-console-operator/prometheus-k8s" (692 of 770) Could not update role "openshift-console/prometheus-k8s" (695 of 770) Could not update rolebinding "openshift-dns-operator/prometheus-k8s" (699 of 770) Could not update rolebinding "openshift-etcd-operator/prometheus-k8s" (703 of 770) Could not update rolebinding "openshift-ingress-operator/prometheus-k8s" (707 of 770) Could not update rolebinding "openshift-kube-apiserver-operator/prometheus-k8s" (711 of 770) Could not update rolebinding "openshift-kube-controller-manager-operator/prometheus-k8s" (720 of 770) Could not update rolebinding "openshift-kube-scheduler-operator/prometheus-k8s" (727 of 770) Could not update servicemonitor "openshift-machine-api/machine-api-operator" (735 of 770) Could not update servicemonitor "openshift-machine-config-operator/machine-config-daemon" (738 of 770) Could not update rolebinding "openshift-operator-lifecycle-manager/operator-lifecycle-manager-metrics" (741 of 770) Could not update rolebinding "openshift-apiserver-operator/prometheus-k8s" (746 of 770) Could not update rolebinding "openshift-controller-manager-operator/prometheus-k8s" (754 of 770) Could not update rolebinding "openshift-service-ca-operator/prometheus-k8s" (760 of 770) Could not update credentialsrequest "openshift-cloud-credential-operator/openshift-machine-api" (762 of 770)]
I0128 05:23:54.503066       1 sync_worker.go:511] Detected while calculating next work: overrides change ([{ConfigMap  kube-system ingress-uid true} {ConfigMap  openshift-config cloud-provider-config true} {ConfigMap  kube-system cluster-config-v1 true} {DNS config.openshift.io  cluster true} {Infrastructure config.openshift.io  cluster true} {Ingress config.openshift.io  cluster true} {CustomResourceDefinition apiextensions.k8s.io  networks.config.openshift.io true} {Network config.openshift.io  cluster true} {Proxy config.openshift.io  cluster true} {Scheduler config.openshift.io  cluster true} {Secret  kube-system kube-cloud-cfg true} {ConfigMap  kube-system root-ca true} {Secret  openshift-machine-config-operator machine-config-server-tls true} {Secret  openshift-config pull-secret true} {Secret  kube-system gcp-credentials true} {Secret  kube-system kubeadmin true} {Role rbac.authorization.k8s.io kube-system gcp-creds-secret-reader true} {ConfigMap  openshift-config openshift-install-manifests true} {PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}] to [{PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}])
I0128 05:23:54.503129       1 sync_worker.go:442] Work changed, transitioning from Initializing and the suggested incoming Updating state to Updating
I0128 05:23:54.503328       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Updating at attempt 0
I0128 05:23:55.805780       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 05:31:15.138816       1 task_graph.go:546] Result of work: [Cluster operator kube-apiserver is not available]
I0128 05:32:14.583090       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Updating at attempt 1
I0128 05:32:14.757400       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 05:34:41.490702       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 05:39:08.311104       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 05:39:11.898243       1 task_graph.go:546] Result of work: []
I0128 05:42:52.216291       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 05:42:52.417818       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 05:43:02.775162       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 05:43:08.096067       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 05:43:20.953035       1 task_graph.go:546] Result of work: []
I0128 05:47:01.273745       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 05:47:01.499238       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 05:47:11.880847       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 05:47:17.200723       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 05:47:30.021159       1 task_graph.go:546] Result of work: []
I0128 05:51:10.338927       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 05:51:10.539055       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 05:51:20.946109       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 05:51:26.270105       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 05:51:39.078904       1 task_graph.go:546] Result of work: []
I0128 05:55:19.397703       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 05:55:19.597123       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 05:55:30.007108       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 05:55:35.247921       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 05:55:48.161754       1 task_graph.go:546] Result of work: []
I0128 05:59:28.479700       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 05:59:28.685770       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 05:59:39.087118       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 05:59:44.405217       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 05:59:57.216546       1 task_graph.go:546] Result of work: []
I0128 06:03:37.535971       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 06:03:37.740963       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 06:03:48.155910       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 06:03:53.536760       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 06:04:06.300595       1 task_graph.go:546] Result of work: []
I0128 06:07:46.618448       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 06:07:46.813701       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 06:07:57.276383       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 06:08:02.595566       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 06:08:15.466740       1 task_graph.go:546] Result of work: []
I0128 06:11:55.785389       1 sync_worker.go:559] Running sync 4.10.0-0.nightly-2022-01-28-042103 (force=false) on generation 2 in state Reconciling at attempt 0
I0128 06:11:55.985479       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0128 06:12:06.395114       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0128 06:12:11.716093       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0128 06:12:24.527108       1 task_graph.go:546] Result of work: []


# oc get clusterversion/version -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2022-01-28T05:11:13Z"
  generation: 2
  name: version
  resourceVersion: "28283"
  uid: 790e4038-e956-4ed6-942f-b41a3ce4491f
spec:
  channel: stable-4.10
  clusterID: d44c2e22-4544-4c74-a0d1-a035cb847f96
  overrides:
  - group: monitoring.coreos.com
    kind: PrometheusRule
    name: machine-api-operator-prometheus-rules
    namespace: openshift-machine-api
    unmanaged: true
status:
  availableUpdates: null
  conditions:
  - lastTransitionTime: "2022-01-28T05:11:13Z"
    message: Disabling ownership via cluster version overrides prevents upgrades.
      Please remove overrides before continuing.
    reason: ClusterVersionOverridesSet
    status: "False"
    type: Upgradeable
  - lastTransitionTime: "2022-01-28T05:39:23Z"
    message: Done applying 4.10.0-0.nightly-2022-01-28-042103
    status: "True"
    type: Available
  - lastTransitionTime: "2022-01-28T05:32:23Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2022-01-28T05:39:23Z"
    message: Cluster version is 4.10.0-0.nightly-2022-01-28-042103
    status: "False"
    type: Progressing
  - lastTransitionTime: "2022-01-28T05:11:13Z"
    message: 'Unable to retrieve available updates: currently reconciling cluster
      version 4.10.0-0.nightly-2022-01-28-042103 not found in the "stable-4.10" channel'
    reason: VersionNotFound
    status: "False"
    type: RetrievedUpdates
  desired:
    image: registry.ci.openshift.org/ocp/release@sha256:2f40c0fb545eb81b2bb4dbc87d381a1ab33beb9b6f42c33b3fb28806f63dea84
    version: 4.10.0-0.nightly-2022-01-28-042103
  history:
  - completionTime: "2022-01-28T05:39:23Z"
    image: registry.ci.openshift.org/ocp/release@sha256:2f40c0fb545eb81b2bb4dbc87d381a1ab33beb9b6f42c33b3fb28806f63dea84
    startedTime: "2022-01-28T05:11:13Z"
    state: Completed
    verified: false
    version: 4.10.0-0.nightly-2022-01-28-042103
  observedGeneration: 2
  versionHash: OEdVOFmx3QQ=

Trevor, It doesn't bother me too much since the installation passed. I'm re-opening it in case you worry about it.

Comment 32 Yang Yang 2022-01-28 07:01:04 UTC

#729 gets 2nd time attempt:

Starting the installation with 4.10.0-0.nightly-2022-01-28-042103 (version_A) and then upgrading to 4.10.0-fc.3 (version_B)

# oc get clusterversion/version -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2022-01-28T05:40:27Z"
  generation: 4
  name: version
  resourceVersion: "47727"
  uid: 2701bfea-2cd4-4f70-9935-acff3fd996ab
spec:
  channel: stable-4.10
  clusterID: bc00530c-7732-4106-9d66-d0475a31b82e
  desiredUpdate:
    force: false
    image: quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2
    version: ""
  overrides:
  - group: monitoring.coreos.com
    kind: PrometheusRule
    name: machine-api-operator-prometheus-rules
    namespace: openshift-machine-api
    unmanaged: true
status:
  availableUpdates: null
  conditions:
  - lastTransitionTime: "2022-01-28T05:40:27Z"
    message: Disabling ownership via cluster version overrides prevents upgrades.
      Please remove overrides before continuing.
    reason: ClusterVersionOverridesSet
    status: "False"
    type: Upgradeable
  - lastTransitionTime: "2022-01-28T06:00:16Z"
    message: Done applying 4.10.0-fc.3
    status: "True"
    type: Available
  - lastTransitionTime: "2022-01-28T06:21:58Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2022-01-28T06:00:16Z"
    message: Cluster version is 4.10.0-fc.3
    status: "False"
    type: Progressing
  - lastTransitionTime: "2022-01-28T05:40:27Z"
    message: 'Unable to retrieve available updates: currently reconciling cluster
      version 4.10.0-fc.3 not found in the "stable-4.10" channel'
    reason: VersionNotFound
    status: "False"
    type: RetrievedUpdates
  desired:
    image: quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2
    version: 4.10.0-fc.3
  history:
  - completionTime: "2022-01-28T06:00:16Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2
    startedTime: "2022-01-28T05:48:13Z"
    state: Completed
    verified: true
    version: 4.10.0-fc.3
  - completionTime: "2022-01-28T05:48:13Z"
    image: registry.ci.openshift.org/ocp/release@sha256:2f40c0fb545eb81b2bb4dbc87d381a1ab33beb9b6f42c33b3fb28806f63dea84
    startedTime: "2022-01-28T05:40:27Z"
    state: Partial
    verified: false
    version: 4.10.0-0.nightly-2022-01-28-042103
  observedGeneration: 4
  versionHash: kPcwc_Mohu4=


Well, once bootstrapping completes, the cluster transitions to the updates version_B. So we see version_A is partially completed. Seems different from comment#10

# grep -r 'Running sync.*in state' cvo/ | sort
cvo/2022-01-28T05:45:55+0000-cluster-version-operator-7c678c568b-gvmb8.log:I0128 05:48:13.408538       1 sync_worker.go:559] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 2 in state Initializing at attempt 0
cvo/2022-01-28T05:45:55+0000-cluster-version-operator-7c678c568b-gvmb8.log:I0128 05:49:11.049978       1 sync_worker.go:559] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 3 in state Updating at attempt 0
cvo/2022-01-28T06:00:31+0000-cluster-version-operator-9b946b49d-9sgd7.log:I0128 05:57:45.919323       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Initializing at attempt 0
cvo/2022-01-28T06:00:31+0000-cluster-version-operator-9b946b49d-9sgd7.log:I0128 06:02:59.124324       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 0
cvo/2022-01-28T06:01:59+0000-cluster-version-operator-9b946b49d-hvjkv.log:I0128 06:05:34.050689       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 0
cvo/2022-01-28T06:01:59+0000-cluster-version-operator-9b946b49d-hvjkv.log:I0128 06:10:48.006766       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 1
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:16:25.533610       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 0
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:21:07.328686       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 1
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:24:04.521057       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 0
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:27:01.714045       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 0
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:29:58.905327       1 sync_worker.go:558] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 4 in state Reconciling at attempt 0

# grep -r 'prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules"' cvo/ | sort
cvo/2022-01-28T06:00:31+0000-cluster-version-operator-9b946b49d-9sgd7.log:I0128 05:57:50.366409       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:00:31+0000-cluster-version-operator-9b946b49d-9sgd7.log:I0128 05:57:50.366421       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:00:31+0000-cluster-version-operator-9b946b49d-9sgd7.log:I0128 06:03:35.770675       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:00:31+0000-cluster-version-operator-9b946b49d-9sgd7.log:I0128 06:03:35.770688       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:01:59+0000-cluster-version-operator-9b946b49d-hvjkv.log:I0128 06:06:01.200271       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:01:59+0000-cluster-version-operator-9b946b49d-hvjkv.log:I0128 06:06:01.200311       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:01:59+0000-cluster-version-operator-9b946b49d-hvjkv.log:I0128 06:11:08.817030       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:01:59+0000-cluster-version-operator-9b946b49d-hvjkv.log:I0128 06:11:08.817040       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:16:54.530675       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:16:54.530683       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:21:28.190978       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:21:28.190987       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:24:34.230948       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:24:34.230962       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:27:27.526164       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:27:27.526185       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:30:24.714642       1 sync_worker.go:771] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769)
cvo/2022-01-28T06:12:49+0000-cluster-version-operator-9b946b49d-65299.log:I0128 06:30:24.714652       1 sync_worker.go:775] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (736 of 769) as unmanaged

The overrides resource is skipped which looks good to me.

Comment 37 W. Trevor King 2022-02-01 23:42:38 UTC

No hits in Loki:

  {invoker!~"|.*upgrade.*"} | unpack | container_name="cluster-version-operator" |= "Work changed, transitioning from"

since [1] landed, outside of a few older releases with the "and the suggested incoming" that I dropped in [1].  Not a lot of data yet, but that's a promising start.

[1]: https://github.com/openshift/cluster-version-operator/pull/733

Comment 39 Yang Yang 2022-02-09 11:42:31 UTC

Verifying #728 730 733 with 4.11.0-0.nightly-2022-02-08-180554

$ journalctl -u bootkube.service |grep "overrides\|patched"
Feb 09 09:31:58 yanyang0209a-s6hkw-bootstrap.c.openshift-qe.internal bootkube.sh[2431]: Created "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version
Feb 09 09:42:51 yanyang0209a-s6hkw-bootstrap.c.openshift-qe.internal bootkube.sh[2431]: Skipped "cvo-overrides.yaml" clusterversions.v1.config.openshift.io/version -n openshift-cluster-version as it already exists
Feb 09 09:42:56 yanyang0209a-s6hkw-bootstrap.c.openshift-qe.internal bootkube.sh[2431]: Sending bootstrap-finished event.Restoring CVO overrides
Feb 09 09:45:36 yanyang0209a-s6hkw-bootstrap.c.openshift-qe.internal bootkube.sh[2431]: Trying again to restore CVO overrides
Feb 09 09:45:37 yanyang0209a-s6hkw-bootstrap.c.openshift-qe.internal bootkube.sh[2431]: clusterversion.config.openshift.io/version patched

$ sudo crictl logs e4dde355b988e 2>&1 | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged'
I0209 09:31:58.859133       1 sync_worker.go:255] Propagating initial target version {4.11.0-0.nightly-2022-02-08-180554 registry.ci.openshift.org/ocp/release@sha256:5c3731a7afc9a8dc31fcafabe71dd4b83b6ac7273cd1fa10d12f916445a64709 false} to sync worker loop in state Initializing.
I0209 09:31:58.859526       1 sync_worker.go:511] Detected while calculating next work: version (from {  false} to {4.11.0-0.nightly-2022-02-08-180554 registry.ci.openshift.org/ocp/release@sha256:5c3731a7afc9a8dc31fcafabe71dd4b83b6ac7273cd1fa10d12f916445a64709 false}) and overrides changes
I0209 09:31:58.859600       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 1 in state Initializing at attempt 0
I0209 09:32:02.265415       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:32:20.915218       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:32:20.915230       1 sync_worker.go:776] Skipping dns "cluster" (23 of 770) as unmanaged
I0209 09:32:24.098689       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0209 09:32:30.996565       1 sync_worker.go:776] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0209 09:32:30.996582       1 sync_worker.go:776] Skipping ingress "cluster" (27 of 770) as unmanaged
I0209 09:32:30.996600       1 sync_worker.go:776] Skipping network "cluster" (28 of 770) as unmanaged
I0209 09:32:38.728092       1 sync_worker.go:776] Skipping proxy "cluster" (32 of 770) as unmanaged
I0209 09:32:38.728107       1 sync_worker.go:776] Skipping scheduler "cluster" (33 of 770) as unmanaged
I0209 09:32:43.841755       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:34:51.521114       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 1 in state Initializing at attempt 1
I0209 09:34:51.976528       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:34:52.102641       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:34:57.377693       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:34:57.377703       1 sync_worker.go:776] Skipping dns "cluster" (23 of 770) as unmanaged
I0209 09:35:02.152195       1 sync_worker.go:776] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0209 09:35:02.152209       1 sync_worker.go:776] Skipping ingress "cluster" (27 of 770) as unmanaged
I0209 09:35:02.152227       1 sync_worker.go:776] Skipping network "cluster" (28 of 770) as unmanaged
I0209 09:35:03.354162       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0209 09:35:08.901910       1 sync_worker.go:776] Skipping proxy "cluster" (32 of 770) as unmanaged
I0209 09:35:08.901934       1 sync_worker.go:776] Skipping scheduler "cluster" (33 of 770) as unmanaged
I0209 09:38:05.251105       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 1 in state Initializing at attempt 2
I0209 09:38:06.509014       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:38:09.457928       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:38:15.811300       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:38:15.811310       1 sync_worker.go:776] Skipping dns "cluster" (23 of 770) as unmanaged
I0209 09:38:17.584653       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0209 09:38:20.639361       1 sync_worker.go:776] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0209 09:38:20.639380       1 sync_worker.go:776] Skipping ingress "cluster" (27 of 770) as unmanaged
I0209 09:38:20.639398       1 sync_worker.go:776] Skipping network "cluster" (28 of 770) as unmanaged
I0209 09:38:22.140200       1 sync_worker.go:776] Skipping proxy "cluster" (32 of 770) as unmanaged
I0209 09:38:22.141445       1 sync_worker.go:776] Skipping scheduler "cluster" (33 of 770) as unmanaged
I0209 09:41:59.924559       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 1 in state Initializing at attempt 3
I0209 09:42:01.232882       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:42:06.365187       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0209 09:42:08.391388       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged

Bootstrap CVO looks good.

# oc -n openshift-cluster-version logs cluster-version-operator-98bcdcc86-cpn69 | grep 'Running sync.*in state\|Running sync.*dns.*cluster\|Detected while\|Ignoring detected version\|Propagating initial target version\|Running sync.* config .*cluster\|Skipping .* as unmanaged\|Result of work\|Work changed'
I0209 09:43:25.819298       1 sync_worker.go:255] Propagating initial target version {4.11.0-0.nightly-2022-02-08-180554 registry.ci.openshift.org/ocp/release@sha256:5c3731a7afc9a8dc31fcafabe71dd4b83b6ac7273cd1fa10d12f916445a64709 false} to sync worker loop in state Initializing.
I0209 09:43:25.822631       1 sync_worker.go:511] Detected while calculating next work: version (from {  false} to {4.11.0-0.nightly-2022-02-08-180554 registry.ci.openshift.org/ocp/release@sha256:5c3731a7afc9a8dc31fcafabe71dd4b83b6ac7273cd1fa10d12f916445a64709 false}) and overrides changes
I0209 09:43:25.822700       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 1 in state Initializing at attempt 0
I0209 09:43:28.205027       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:43:30.432653       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:43:32.939159       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:43:32.939170       1 sync_worker.go:776] Skipping dns "cluster" (23 of 770) as unmanaged
I0209 09:43:35.011136       1 sync_worker.go:776] Skipping infrastructure "cluster" (26 of 770) as unmanaged
I0209 09:43:35.011154       1 sync_worker.go:776] Skipping ingress "cluster" (27 of 770) as unmanaged
I0209 09:43:35.011171       1 sync_worker.go:776] Skipping network "cluster" (28 of 770) as unmanaged
I0209 09:43:37.698614       1 sync_worker.go:776] Skipping proxy "cluster" (32 of 770) as unmanaged
I0209 09:43:37.698631       1 sync_worker.go:776] Skipping scheduler "cluster" (33 of 770) as unmanaged
I0209 09:43:38.158585       1 sync_worker.go:776] Skipping customresourcedefinition "networks.config.openshift.io" (47 of 770) as unmanaged
I0209 09:45:38.549286       1 sync_worker.go:511] Detected while considering cluster version generation 2: overrides change ([{ConfigMap  kube-system ingress-uid true} {ConfigMap  openshift-config cloud-provider-config true} {ConfigMap  kube-system cluster-config-v1 true} {DNS config.openshift.io  cluster true} {Infrastructure config.openshift.io  cluster true} {Ingress config.openshift.io  cluster true} {CustomResourceDefinition apiextensions.k8s.io  networks.config.openshift.io true} {Network config.openshift.io  cluster true} {Proxy config.openshift.io  cluster true} {Scheduler config.openshift.io  cluster true} {Secret  kube-system kube-cloud-cfg true} {ConfigMap  kube-system root-ca true} {Secret  openshift-machine-config-operator machine-config-server-tls true} {Secret  openshift-config pull-secret true} {Secret  kube-system gcp-credentials true} {Secret  kube-system kubeadmin true} {Role rbac.authorization.k8s.io kube-system gcp-creds-secret-reader true} {ConfigMap  openshift-config openshift-install-manifests true} {PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}] to [{PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}])
I0209 09:45:38.630583       1 task_graph.go:546] Result of work: [Cluster operator kube-apiserver is not available Cluster operator authentication is not available Cluster operator image-registry is not available Cluster operator ingress is not available Cluster operator monitoring is not available Cluster operator openshift-samples is not available Cluster operator console is not available]
I0209 09:45:38.648484       1 sync_worker.go:511] Detected while calculating next work: overrides change ([{ConfigMap  kube-system ingress-uid true} {ConfigMap  openshift-config cloud-provider-config true} {ConfigMap  kube-system cluster-config-v1 true} {DNS config.openshift.io  cluster true} {Infrastructure config.openshift.io  cluster true} {Ingress config.openshift.io  cluster true} {CustomResourceDefinition apiextensions.k8s.io  networks.config.openshift.io true} {Network config.openshift.io  cluster true} {Proxy config.openshift.io  cluster true} {Scheduler config.openshift.io  cluster true} {Secret  kube-system kube-cloud-cfg true} {ConfigMap  kube-system root-ca true} {Secret  openshift-machine-config-operator machine-config-server-tls true} {Secret  openshift-config pull-secret true} {Secret  kube-system gcp-credentials true} {Secret  kube-system kubeadmin true} {Role rbac.authorization.k8s.io kube-system gcp-creds-secret-reader true} {ConfigMap  openshift-config openshift-install-manifests true} {PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}] to [{PrometheusRule monitoring.coreos.com openshift-machine-api machine-api-operator-prometheus-rules true}])
I0209 09:45:38.648660       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Initializing at attempt 1
I0209 09:45:41.023666       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:45:42.895609       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:45:44.436761       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:48:18.350215       1 task_graph.go:546] Result of work: [Cluster operator kube-apiserver is not available Cluster operator authentication is not available Cluster operator ingress is not available Cluster operator monitoring is not available Cluster operator console is not available]
I0209 09:49:06.242723       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Initializing at attempt 2
I0209 09:49:07.569792       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:49:10.087267       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:49:15.283317       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:51:45.944372       1 task_graph.go:546] Result of work: [Cluster operator kube-apiserver is not available Cluster operator authentication is not available Cluster operator console is not available]
I0209 09:53:07.442439       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Initializing at attempt 3
I0209 09:53:09.476108       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:53:11.302903       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:53:13.755557       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:55:24.942482       1 task_graph.go:546] Result of work: []
I0209 09:58:04.644601       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 09:58:04.845645       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 09:58:15.253443       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 09:58:20.583140       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 09:58:33.388160       1 task_graph.go:546] Result of work: []
I0209 10:01:13.089653       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:01:13.290250       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:01:23.698845       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:01:29.070506       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:01:41.828826       1 task_graph.go:546] Result of work: []
I0209 10:04:21.530633       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:04:21.728784       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:04:32.138515       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:04:37.457063       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:04:50.365215       1 task_graph.go:546] Result of work: []
I0209 10:07:30.066659       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:07:30.274922       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:07:40.676567       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:07:45.995535       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:07:58.814672       1 task_graph.go:546] Result of work: []
I0209 10:10:38.515931       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:10:38.711635       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:10:49.124265       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:10:54.445850       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:11:07.254945       1 task_graph.go:546] Result of work: []
I0209 10:13:46.957555       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:13:47.161303       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:13:57.516142       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:14:02.840961       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:14:15.695227       1 task_graph.go:546] Result of work: []
I0209 10:16:55.398579       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:16:55.606493       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:17:05.958672       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:17:11.277708       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:17:24.162985       1 task_graph.go:546] Result of work: []
I0209 10:20:03.864557       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:20:04.062980       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:20:14.471667       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:20:19.746068       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:20:32.604947       1 task_graph.go:546] Result of work: []
I0209 10:23:12.306493       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:23:12.503724       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:23:22.914980       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:23:28.187451       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:23:41.037732       1 task_graph.go:546] Result of work: []
I0209 10:26:20.738867       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:26:20.935367       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:26:31.346095       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:26:36.669632       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:26:49.492239       1 task_graph.go:546] Result of work: []
I0209 10:29:29.194558       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:29:29.391015       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:29:39.801679       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:29:45.123273       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:29:57.942004       1 task_graph.go:546] Result of work: []
I0209 10:32:37.643452       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:32:37.843874       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:32:48.251471       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:32:53.576501       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:33:06.380358       1 task_graph.go:546] Result of work: []
I0209 10:35:46.081615       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:35:46.280946       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:35:56.688856       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:36:02.009086       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:36:14.822498       1 task_graph.go:546] Result of work: []
I0209 10:38:54.524957       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:38:54.718921       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:39:05.082343       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:39:10.401286       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:39:23.262575       1 task_graph.go:546] Result of work: []
I0209 10:42:02.963625       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:42:03.161844       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:42:13.577747       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:42:18.895183       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:42:31.721121       1 task_graph.go:546] Result of work: []
I0209 10:45:11.422724       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:45:11.624706       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:45:21.981476       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:45:27.301605       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:45:40.203716       1 task_graph.go:546] Result of work: []
I0209 10:48:19.905441       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:48:20.139336       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:48:30.513643       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:48:35.832514       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:48:48.640087       1 task_graph.go:546] Result of work: []
I0209 10:51:28.341580       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:51:28.539252       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:51:38.949724       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:51:44.270843       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:51:57.082121       1 task_graph.go:546] Result of work: []
I0209 10:54:36.787532       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:54:36.988823       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:54:47.395701       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:54:52.715829       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:55:05.531716       1 task_graph.go:546] Result of work: []
I0209 10:57:45.234883       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 10:57:45.437206       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 10:57:55.844700       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 10:58:01.261798       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 10:58:14.072135       1 task_graph.go:546] Result of work: []
I0209 11:00:53.773560       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:00:53.973840       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:01:04.381726       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:01:09.702085       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:01:22.518212       1 task_graph.go:546] Result of work: []
I0209 11:04:02.219243       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:04:02.428450       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:04:12.829033       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:04:18.160905       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:04:30.965196       1 task_graph.go:546] Result of work: []
I0209 11:07:10.673942       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:07:10.941351       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:07:21.301089       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:07:26.667742       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:07:39.431591       1 task_graph.go:546] Result of work: []
I0209 11:10:19.132591       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:10:19.339973       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:10:29.692298       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:10:35.016510       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:10:47.873403       1 task_graph.go:546] Result of work: []
I0209 11:13:27.574900       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:13:27.770469       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:13:38.183337       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:13:43.502852       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:13:56.314521       1 task_graph.go:546] Result of work: []
I0209 11:16:36.015924       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:16:36.221870       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:16:46.624936       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:16:51.942818       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:17:04.760443       1 task_graph.go:546] Result of work: []
I0209 11:19:44.462988       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:19:44.662915       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:19:55.071746       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:20:00.392989       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:20:13.201607       1 task_graph.go:546] Result of work: []
I0209 11:22:52.902555       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:22:53.104947       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:23:03.512205       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:23:08.831249       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:23:21.655319       1 task_graph.go:546] Result of work: []
I0209 11:26:01.359717       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:26:01.557560       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:26:11.966429       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:26:17.287877       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:26:30.122640       1 task_graph.go:546] Result of work: []
I0209 11:29:09.825291       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-08-180554 (force=false) on generation 2 in state Reconciling at attempt 0
I0209 11:29:10.036769       1 sync_worker.go:772] Running sync for dns "cluster" (23 of 770)
I0209 11:29:20.433540       1 sync_worker.go:772] Running sync for config "cluster" (155 of 770)
I0209 11:29:25.755072       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
I0209 11:29:38.557356       1 task_graph.go:546] Result of work: []

Control-plane CVO looks good as well. Updating is not found.

I think verification for #728 passed.

For #729, Trevor, could you please take a look at comment#32? 

Thanks.

Comment 40 W. Trevor King 2022-02-10 06:39:33 UTC

Comment 39 looks good to me too.

Going back to comment 32:

(In reply to Yang Yang from comment #32)
>   - completionTime: "2022-01-28T05:48:13Z"
>     image:
> registry.ci.openshift.org/ocp/release@sha256:
> 2f40c0fb545eb81b2bb4dbc87d381a1ab33beb9b6f42c33b3fb28806f63dea84
>     startedTime: "2022-01-28T05:40:27Z"
>     state: Partial
>     verified: false
>     version: 4.10.0-0.nightly-2022-01-28-042103

Yeah, we want 'state: Completed' here.

> # grep -r 'Running sync.*in state' cvo/ | sort
> cvo/2022-01-28T05:45:55+0000-cluster-version-operator-7c678c568b-gvmb8.log: I0128 05:48:13.408538       1 sync_worker.go:559] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 2 in state Initializing at attempt 0
> cvo/2022-01-28T05:45:55+0000-cluster-version-operator-7c678c568b-gvmb8.log: I0128 05:49:11.049978       1 sync_worker.go:559] Running sync quay.io/openshift-release-dev/ocp-release@sha256:33a77dfec9665e932183b36541dbaba5f6b873c37d4cc3cc7d094ff2dff057f2 (force=false) on generation 3 in state Updating at attempt 0

Hmm.  The full logs from this would be helpful.  'Result of work' would let us know whether we had completed the Initializing attempt without error (unlikely but possible) and if we had not successfully completed that round, perhaps there is other context that explains the CVO's reasoning for the switch.  And maybe #729 needs #733 or some later change to avoid the update transition?

Comment 41 Yang Yang 2022-02-11 08:41:30 UTC

Verifying #729 with 4.11.0-0.nightly-2022-02-10-031822

1) Install a cluster with 4.11.0-0.nightly-2022-02-10-031822

2) Overrides prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules"

3) Upgrade to 4.11.0-0.nightly-2022-02-10-031822 during installation

# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 --allow-explicit-upgrade --force --allow-upgrade-with-warnings
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
warning: --allow-upgrade-with-warnings is bypassing: the cluster is already upgrading:

  Reason: 
  Message: Working towards 4.11.0-0.nightly-2022-02-10-031822: 656 of 770 done (85% complete)

Updating to release image registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755

After installation completes,

# oc get clusterversion -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2022-02-11T07:48:15Z"
    generation: 3
    name: version
    resourceVersion: "45975"
    uid: cb868617-7052-4168-a25d-e05356e113ee
  spec:
    channel: stable-4.10
    clusterID: 56b6f356-b7a6-4559-a8cb-d333f05d7554
    desiredUpdate:
      force: true
      image: registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755
      version: ""
    overrides:
    - group: monitoring.coreos.com
      kind: PrometheusRule
      name: machine-api-operator-prometheus-rules
      namespace: openshift-machine-api
      unmanaged: true
  status:
    availableUpdates: null
    capabilities: {}
    conditions:
    - lastTransitionTime: "2022-02-11T07:48:15Z"
      message: Disabling ownership via cluster version overrides prevents upgrades.
        Please remove overrides before continuing.
      reason: ClusterVersionOverridesSet
      status: "False"
      type: Upgradeable
    - lastTransitionTime: "2022-02-11T08:15:49Z"
      message: Done applying 4.11.0-0.nightly-2022-02-10-180317
      status: "True"
      type: Available
    - lastTransitionTime: "2022-02-11T08:15:49Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2022-02-11T08:15:49Z"
      message: Cluster version is 4.11.0-0.nightly-2022-02-10-180317
      status: "False"
      type: Progressing
    - lastTransitionTime: "2022-02-11T07:48:15Z"
      message: 'Unable to retrieve available updates: currently reconciling cluster
        version 4.11.0-0.nightly-2022-02-10-180317 not found in the "stable-4.10"
        channel'
      reason: VersionNotFound
      status: "False"
      type: RetrievedUpdates
    desired:
      image: registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755
      version: 4.11.0-0.nightly-2022-02-10-180317
    history:
    - completionTime: "2022-02-11T08:15:49Z"
      image: registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755
      startedTime: "2022-02-11T08:07:07Z"
      state: Completed
      verified: true
      version: 4.11.0-0.nightly-2022-02-10-180317
    - completionTime: "2022-02-11T08:07:07Z"
      image: registry.ci.openshift.org/ocp/release@sha256:fcbbfdc2d06fe1587222dc2f69317a4031d113894fe913c3fd4fde76507169cd
      startedTime: "2022-02-11T07:48:15Z"
      state: Partial
      verified: false
      version: 4.11.0-0.nightly-2022-02-10-031822
    observedGeneration: 3
    versionHash: BpkwDMmFa2M=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Well, seems like the install-time version change is not a big issue.

$ sudo crictl logs ffa028b2a3c94 2>&1 | grep 'Running sync.*in state'
I0211 07:48:15.362013       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-10-031822 (force=false) on generation 1 in state Initializing at attempt 0
I0211 07:50:58.479015       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-10-031822 (force=false) on generation 1 in state Initializing at attempt 1
I0211 07:53:59.227421       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-10-031822 (force=false) on generation 1 in state Initializing at attempt 2
I0211 07:57:39.202030       1 sync_worker.go:559] Running sync 4.11.0-0.nightly-2022-02-10-031822 (force=false) on generation 2 in state Initializing at attempt 3

# grep -r 'Running sync.*in state' | sort -k 4,5
2022-02-11T08:04:32+0000-cluster-version-operator-55974f4bd6-s9sd6.log:I0211 08:07:07.223373       1 sync_worker.go:559] Running sync registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 (force=true) on generation 3 in state Initializing at attempt 0
2022-02-11T08:07:09+0000-cluster-version-operator-85485ddd46-4xgrz.log:I0211 08:07:51.476199       1 sync_worker.go:559] Running sync registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 (force=true) on generation 3 in state Initializing at attempt 0
2022-02-11T08:07:09+0000-cluster-version-operator-85485ddd46-4xgrz.log:I0211 08:10:44.573443       1 sync_worker.go:559] Running sync registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 (force=true) on generation 3 in state Initializing at attempt 1
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:11:31.922276       1 sync_worker.go:559] Running sync registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 (force=true) on generation 3 in state Initializing at attempt 0
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:15:25.815035       1 sync_worker.go:559] Running sync registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 (force=true) on generation 3 in state Initializing at attempt 1
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:19:13.117953       1 sync_worker.go:559] Running sync registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 (force=true) on generation 3 in state Reconciling at attempt 0
2022-02-11T08:20:49+0000-cluster-version-operator-85485ddd46-8xcc7.log:I0211 08:21:34.292985       1 sync_worker.go:559] Running sync registry.ci.openshift.org/ocp/release@sha256:2df40e16d9933cbf68737aee761604c22d1d4d1635c3a13a4c34b8efaafb7755 (force=true) on generation 3 in state Reconciling at attempt 0

The state transitions from Initializing (Version_A) ->  Initializing (Version_B) -> Reconciling (Version_B). Looks good to me.

# grep -r 'prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules"'
2022-02-11T08:20:49+0000-cluster-version-operator-85485ddd46-8xcc7.log:I0211 08:22:02.108384       1 sync_worker.go:772] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770)
2022-02-11T08:20:49+0000-cluster-version-operator-85485ddd46-8xcc7.log:I0211 08:22:02.108395       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:11:39.228554       1 sync_worker.go:772] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770)
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:11:39.228652       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:15:29.279041       1 sync_worker.go:772] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770)
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:15:29.279054       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:19:29.049442       1 sync_worker.go:772] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770)
2022-02-11T08:10:48+0000-cluster-version-operator-85485ddd46-g9mxm.log:I0211 08:19:29.049450       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
2022-02-11T08:07:09+0000-cluster-version-operator-85485ddd46-4xgrz.log:I0211 08:07:58.020101       1 sync_worker.go:772] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770)
2022-02-11T08:07:09+0000-cluster-version-operator-85485ddd46-4xgrz.log:I0211 08:07:58.020533       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
2022-02-11T08:07:09+0000-cluster-version-operator-85485ddd46-4xgrz.log:I0211 08:10:49.209627       1 sync_worker.go:772] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770)
2022-02-11T08:07:09+0000-cluster-version-operator-85485ddd46-4xgrz.log:I0211 08:10:49.209637       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged
2022-02-11T08:04:32+0000-cluster-version-operator-55974f4bd6-s9sd6.log:I0211 08:07:18.010050       1 sync_worker.go:772] Running sync for prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770)
2022-02-11T08:04:32+0000-cluster-version-operator-55974f4bd6-s9sd6.log:I0211 08:07:18.010061       1 sync_worker.go:776] Skipping prometheusrule "openshift-machine-api/machine-api-operator-prometheus-rules" (737 of 770) as unmanaged

The overrides resource is skipped. Looks good to me.

So based on the comment and comment#39, moving it to verified state.

Comment 44 errata-xmlrpc 2022-08-10 10:37:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.