+++ This bug was initially created as a clone of Bug #2050946 +++ Poking around in rc.1 to look at the tech-preview, cluster-API bits: $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.10.0-rc.1-x86_64 Extracted release payload from digest sha256:eadfb8892c466438995e979f9a112c454286b4c0888eeb90da19354e8410fb24 created at 2022-02-04T20:00:57Z $ jq . manifests/image-references | jq -r '.spec.tags[] | select(.name | contains("cluster-api")) | .name + " " + .from.name + " " + .annotations["io.openshift.build.source-location"]' cluster-api quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a11772593573d97529cd243bca90d963acb094a0640417608f7f863ac0f8714c https://github.com/openshift/cluster-api $ grep -r a11772593573d97529cd243bca90d963acb094a0640417608f7f863ac0f8714c manifests manifests/image-references: "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a11772593573d97529cd243bca90d963acb094a0640417608f7f863ac0f8714c" nothing consuming that image? Anyhow, there are some tech-preview manifests: $ grep -r feature-gate manifests | grep namespace manifests/0000_30_cluster-api_capi-operator_00_namespace.yaml: release.openshift.io/feature-gate: "TechPreviewNoUpgrade" which seems to be from [1]. $ grep name: manifests/0000_30_cluster-api_capi-operator_00_namespace.yaml name: openshift-cluster-api Looking in a recent tech-preview CI job [2], I don't see that namespace: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264/artifacts/e2e-aws-techpreview/gather-extra/artifacts/namespaces.json | jq -r '.items[].metadata.name' | grep cluster openshift-cluster-csi-drivers ... Because the CVO doen't think it's a tech-preview cluster? $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264/artifacts/e2e-aws-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-76cd65b7bb-4p945_cluster-version-operator.log | grep tech I0204 10:19:53.097129 1 techpreviewchangestopper.go:97] Starting stop-on-techpreview-change controller with TechPreviewNoUpgrade false. Checking the gate: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264/artifacts/e2e-aws-techpreview/gather-must-gather/artifacts/must-gather.tar | tar tzv | grep featuregates.yaml -rw------- 1015690000/root 594 2022-02-04 02:59 quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-90979cdd2bba6c5a63b62d3fa2cc3aca15e3e7c684198b40e915be82f4fdd175/cluster-scoped-resources/config.openshift.io/featuregates.yaml $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264/artifacts/e2e-aws-techpreview/gather-must-gather/artifacts/must-gather.tar | tar xOz quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-90979cdd2bba6c5a63b62d3fa2cc3aca15e3e7c684198b40e915be82f4fdd175/cluster-scoped-resources/config.openshift.io/featuregates.yaml | yaml2json | jq -r '.items[] | {creationTimestamp: .metadata.creationTimestamp, featureSet: .spec.featureSet}' { "creationTimestamp": "2022-02-04T10:05:23Z", "featureSet": "TechPreviewNoUpgrade" } Aha, it's because we are edge triggered, but we're missing the initial state sync: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264/artifacts/e2e-aws-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-76cd65b7bb-4p945_cluster-version-operator.log | grep 'Error getting featuregate value\|tech' W0204 10:10:40.126809 1 start.go:142] Error getting featuregate value: Get "https://127.0.0.1:6443/apis/config.openshift.io/v1/featuregates/cluster": dial tcp 127.0.0.1:6443: connect: connection refused I0204 10:19:53.097129 1 techpreviewchangestopper.go:97] Starting stop-on-techpreview-change controller with TechPreviewNoUpgrade false. Not clear to me why our informer logic doesn't eventually resync and let us know it was confused. Perhaps we need some sort of post-cache-sync check before we enter the event-driven loop? Or, if we can't figure that graceful approach out, we could revert [3], and just crash-loop until the initial FeatureGate fetch succeeds. [1]: https://github.com/openshift/cluster-capi-operator/blob/ef673ccce35453a8ae8a9ef9016e322c93d8c03a/manifests/0000_30_cluster-api_capi-operator_00_namespace.yaml [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1489537240471179264 [3]: https://github.com/openshift/cluster-version-operator/commit/90b14542aed4134eb4054780f8550191dde01381 --- Additional comment from W. Trevor King on 2022-02-11 04:27:35 UTC --- Fix landed in [1], which ran TechPreview CI like [2]. Checking its CVO logs: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-techpreview/1491953215070015488/artifacts/e2e-aws-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-7785c85b54-l9t4s_cluster-version-operator_previous.log | grep 'ClusterVersionOperator\|Error getting featuregate value\|techpreview' I0211 02:11:23.577531 1 start.go:23] ClusterVersionOperator 4.11.0-202202102052.p0.gb133d82.assembly.stream-b133d82 W0211 02:11:23.579038 1 start.go:142] Error getting featuregate value: Get "https://127.0.0.1:6443/apis/config.openshift.io/v1/featuregates/cluster": dial tcp 127.0.0.1:6443: connect: connection refused I0211 02:30:49.795387 1 techpreviewchangestopper.go:101] Starting stop-on-techpreview-change controller with TechPreviewNoUpgrade false. I0211 02:30:49.795936 1 techpreviewchangestopper.go:78] TechPreviewNoUpgrade was false, but the current feature set is "TechPreviewNoUpgrade"; requesting shutdown. I0211 02:30:49.796010 1 techpreviewchangestopper.go:109] Shutting down stop-on-techpreview-change controller I0211 02:30:49.795882 1 cvo.go:338] Starting ClusterVersionOperator with minimum reconcile period 3m8.377710572s I0211 02:30:49.796588 1 start.go:282] Collected stop-on-techpreview-change controller goroutine. I0211 02:30:49.807000 1 cvo.go:440] Shutting down ClusterVersionOperator I0211 02:30:49.814421 1 start.go:31] Graceful shutdown complete for ClusterVersionOperator 4.11.0-202202102052.p0.gb133d82.assembly.stream-b133d82. So that's the initial control-plane CVO, and hit the early-install Kube-API hiccup, assumed the gate was off, Kube-API came up, CVO got a lease, launched controllers, noticed the gate was actually TechPreviewNoUpgrade, and gracefully shutdown. Lovely. When the replacement container comes up: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-techpreview/1491953215070015488/artifacts/e2e-aws-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-7785c85b54-l9t4s_cluster-version-operator.log | grep 'ClusterVersionOperator\|Error getting featuregate value\|techpreview' I0211 02:30:50.795928 1 start.go:23] ClusterVersionOperator 4.11.0-202202102052.p0.gb133d82.assembly.stream-b133d82 I0211 02:30:51.413209 1 techpreviewchangestopper.go:101] Starting stop-on-techpreview-change controller with TechPreviewNoUpgrade true. I0211 02:30:51.413803 1 cvo.go:338] Starting ClusterVersionOperator with minimum reconcile period 2m15.086131489s So it noticed when it was starting up that it was TechPreviewNoUpgrade, no reboot required. Also lovely. Checking the namespace: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-techpreview/1491953215070015488/artifacts/e2e-aws-techpreview/gather-extra/artifacts/namespaces.json | jq -r '.items[].metadata.name' | grep cluster openshift-cluster-api openshift-cluster-csi-drivers ... Hooray :). And the ClusterOperator for good measure: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-techpreview/1491953215070015488/artifacts/e2e-aws-techpreview/gather-extra/artifacts/clusteroperators.json | jq -r '.items[] | select(.metadata.name == "cluster-api").status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-02-11T02:09:50Z Available=True AsExpected: Cluster CAPI Operator is available at 4.11.0-0.nightly-2022-02-11-014337 2022-02-11T02:09:50Z Progressing=False AsExpected: 2022-02-11T02:09:50Z Degraded=False AsExpected: 2022-02-11T02:09:50Z Upgradeable=True AsExpected: Hooray :). Looks verified to me, but I'll wait for a second set of eyes from QE :) [1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4.11.0-0.nightly/release/4.11.0-0.nightly-2022-02-11-014337 [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-techpreview/1491953215070015488
Version:4.10.0-0.nightly-2022-03-11-131656 TechPreviewNoUpgrade featureset was created 13:40:00Z. # curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1502274584513089536/artifacts/e2e-aws-techpreview/gather-must-gather/artifacts/must-gather.tar | tar xOz quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-de53ed4750def9ed611ca460c780cc78eece525162747213cfdbb4c26687460c/cluster-scoped-resources/config.openshift.io/featuregates.yaml | yaml2json | jq -r '.items[] | {creationTimestamp: .metadata.creationTimestamp, featureSet: .spec.featureSet}' { "creationTimestamp": "2022-03-11T13:40:00Z", "featureSet": "TechPreviewNoUpgrade" } During initial stage, no tp operator installed due to kube-api is not reday even the TechPreviewNoUpgrade featureset is enabled. # curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1502274584513089536/artifacts/e2e-aws-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-68577f66bd-vkl26_cluster-version-operator_previous.log|grep 'ClusterVersionOperator\|TechPreviewNoUpgrade\|featuregate' I0311 13:42:34.298586 1 start.go:23] ClusterVersionOperator 4.10.0-202203110747.p0.g50246cb.assembly.stream-50246cb W0311 13:42:34.299262 1 start.go:142] Error getting featuregate value: Get "https://127.0.0.1:6443/apis/config.openshift.io/v1/featuregates/cluster": dial tcp 127.0.0.1:6443: connect: connection refused ... I0311 13:47:30.989340 1 techpreviewchangestopper.go:101] Starting stop-on-techpreview-change controller with TechPreviewNoUpgrade false. I0311 13:47:30.989386 1 techpreviewchangestopper.go:78] TechPreviewNoUpgrade was false, but the current feature set is "TechPreviewNoUpgrade"; requesting shutdown. I0311 13:47:30.989471 1 cvo.go:338] Starting ClusterVersionOperator with minimum reconcile period 2m15.951374588s I0311 13:47:30.995186 1 cvo.go:440] Shutting down ClusterVersionOperator I0311 13:47:31.000454 1 start.go:31] Graceful shutdown complete for ClusterVersionOperator 4.10.0-202203110747.p0.g50246cb.assembly.stream-50246cb. After initial finish, tp operator is installed successfully. # curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-aws-techpreview/1502274584513089536/artifacts/e2e-aws-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-68577f66bd-vkl26_cluster-version-operator.log|grep 'ClusterVersionOperator\|TechPreviewNoUpgrade\|featuregate'|head -n7 I0311 13:47:32.320511 1 start.go:23] ClusterVersionOperator 4.10.0-202203110747.p0.g50246cb.assembly.stream-50246cb I0311 13:47:32.927407 1 techpreviewchangestopper.go:101] Starting stop-on-techpreview-change controller with TechPreviewNoUpgrade true. I0311 13:47:32.932766 1 cvo.go:338] Starting ClusterVersionOperator with minimum reconcile period 3m57.431406309s I0311 13:47:41.589515 1 sync_worker.go:772] Running sync for customresourcedefinition "featuregates.config.openshift.io" (41 of 835) I0311 13:47:41.637945 1 sync_worker.go:784] Done syncing for customresourcedefinition "featuregates.config.openshift.io" (41 of 835) I0311 13:47:42.536406 1 sync_worker.go:772] Running sync for featuregate "cluster" (24 of 835) I0311 13:47:43.508424 1 sync_worker.go:784] Done syncing for featuregate "cluster" (24 of 835)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.5 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0928