Description of problem: Version-Release number of the following components: 4.11.0-0.nightly-2022-05-25-123329 How reproducible: Always Steps to Reproduce: 1. set up a cluster in a restricted network using 4.11.0-0.nightly-2022-05-25-123329 2. mirror 4.11.0-0.nightly-2022-05-25-193227 to private registry 3. upgrade the cluster to 4.11.0-0.nightly-2022-05-25-193227 without --force option $ oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd Actual results: Wait for 3+ hours, no any upgrade history info in clusterversion, from event log, only can see "Retrieving and verifying payload". [root@preserve-jialiu-ansible ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-25-123329 True False 160m Cluster version is 4.11.0-0.nightly-2022-05-25-123329 [root@preserve-jialiu-ansible ~]# oc get clusterversion -o yaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: creationTimestamp: "2022-05-26T03:51:28Z" generation: 3 name: version resourceVersion: "62069" uid: b5674b4b-7295-4287-904c-94fe1112659b spec: channel: stable-4.11 clusterID: 027285eb-b4ea-4127-85b6-031c1af7db72 desiredUpdate: force: false image: registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd version: "" status: availableUpdates: null capabilities: enabledCapabilities: - baremetal - marketplace - openshift-samples knownCapabilities: - baremetal - marketplace - openshift-samples conditions: - lastTransitionTime: "2022-05-26T03:51:31Z" message: Capabilities match configured spec reason: AsExpected status: "False" type: ImplicitlyEnabledCapabilities - lastTransitionTime: "2022-05-26T03:51:31Z" message: Payload loaded version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" reason: PayloadLoaded status: "True" type: ReleaseAccepted - lastTransitionTime: "2022-05-26T04:23:06Z" message: Done applying 4.11.0-0.nightly-2022-05-25-123329 status: "True" type: Available - lastTransitionTime: "2022-05-26T04:21:21Z" status: "False" type: Failing - lastTransitionTime: "2022-05-26T04:23:06Z" message: Cluster version is 4.11.0-0.nightly-2022-05-25-123329 status: "False" type: Progressing - lastTransitionTime: "2022-05-26T03:51:31Z" message: 'Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.11&id=027285eb-b4ea-4127-85b6-031c1af7db72&version=4.11.0-0.nightly-2022-05-25-123329": dial tcp 34.228.45.157:443: connect: connection timed out' reason: RemoteFailed status: "False" type: RetrievedUpdates desired: image: registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1 version: 4.11.0-0.nightly-2022-05-25-123329 history: - completionTime: "2022-05-26T04:23:06Z" image: registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1 startedTime: "2022-05-26T03:51:31Z" state: Completed verified: false version: 4.11.0-0.nightly-2022-05-25-123329 observedGeneration: 2 versionHash: jOIXVtM5Y-g= kind: List metadata: resourceVersion: "" [root@preserve-jialiu-ansible ~]# oc get event -n openshift-cluster-version LAST SEEN TYPE REASON OBJECT MESSAGE 3h11m Warning FailedScheduling pod/cluster-version-operator-b4b6c5f9b-p7fjq no nodes available to schedule pods 3h9m Warning FailedScheduling pod/cluster-version-operator-b4b6c5f9b-p7fjq no nodes available to schedule pods 3h4m Normal Scheduled pod/cluster-version-operator-b4b6c5f9b-p7fjq Successfully assigned openshift-cluster-version/cluster-version-operator-b4b6c5f9b-p7fjq to jialiu411a-5nb8n-master-2 by jialiu411a-5nb8n-bootstrap 3h2m Warning FailedMount pod/cluster-version-operator-b4b6c5f9b-p7fjq MountVolume.SetUp failed for volume "serving-cert" : secret "cluster-version-operator-serving-cert" not found 3h1m Warning FailedMount pod/cluster-version-operator-b4b6c5f9b-p7fjq Unable to attach or mount volumes: unmounted volumes=[serving-cert], unattached volumes=[etc-ssl-certs etc-cvo-updatepayloads serving-cert service-ca kube-api-access]: timed out waiting for the condition 3h1m Normal Pulling pod/cluster-version-operator-b4b6c5f9b-p7fjq Pulling image "registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" 3h1m Normal Pulled pod/cluster-version-operator-b4b6c5f9b-p7fjq Successfully pulled image "registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" in 1.384468759s 3h1m Normal Created pod/cluster-version-operator-b4b6c5f9b-p7fjq Created container cluster-version-operator 3h1m Normal Started pod/cluster-version-operator-b4b6c5f9b-p7fjq Started container cluster-version-operator 3h11m Normal SuccessfulCreate replicaset/cluster-version-operator-b4b6c5f9b Created pod: cluster-version-operator-b4b6c5f9b-p7fjq 3h11m Normal ScalingReplicaSet deployment/cluster-version-operator Scaled up replica set cluster-version-operator-b4b6c5f9b to 1 3h12m Normal LeaderElection configmap/version jialiu411a-5nb8n-bootstrap_0a3ff57f-66cf-4f93-bbe0-484effcc4383 became leader 3h12m Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" 3h12m Normal LoadPayload clusterversion/version Loading payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" 3h12m Normal PayloadLoaded clusterversion/version Payload loaded version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" 166m Normal LeaderElection configmap/version jialiu411a-5nb8n-master-2_83752e0b-1ef4-4c69-814f-8eeb54d50781 became leader 166m Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" 166m Normal LoadPayload clusterversion/version Loading payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" 166m Normal PayloadLoaded clusterversion/version Payload loaded version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" 77m Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="" image="registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd" Expected results: CVO and `oc adm upgrade` should clearly prompt user what issues happened there, but not pending there for a long time without any info. Additional info: Try the same upgrade path against a connected cluster, upgrade is kicked off soon, no such issues.
After waiting for 7+ hours, the image validation error message finally printed out. $ oc adm upgrade <--snip--> ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd" failure=The update cannot be verified: unable to verify sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd against keyrings: verifier-public-key-redhat
It's not hung. CVO will continue to attempt to load the desired payload and update the Condition ReleaseAccepted which is indicating that the retrieval failed: { "lastTransitionTime": "2022-05-26T13:58:55Z", "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-20-21392\" failure=Unable to download and prepare the update: deadline exceeded, reason: \"DeadlineExceeded\", message: \"Job was active longer than specified deadline\"", "reason": "RetrievePayload", "status": "False", "type": "ReleaseAccepted" } If this were a temporary issue, e.g. network connection flake that clears, the retrieval would succeed and the update would continue. This also occurs if the user specifies a non-existent release, e.g. due to a typo in the "--to-image" name. The user can simply clear the current upgrade request and then specify the correct upgrade.
For user experience, waiting for 7+ hours to get error message make no much difference with "hang". And this testing is not some new testing, when I was running 4.9 -> 4.10 testing, image validation failure will be return in 5 mins, and be prompted to users. Now the behaviour obviously get changed.
The waiting time is too too too long(from 05:46:29 to 11:01:03), and there is not any status from both `oc get clusterversion` and `oc adm upgrade` to indicate cvo is attempting to retrieving and verifying payload. So to users who trigger the upgrade, it looks like the upgrade is not triggerred at all. # cat cvo.log|grep "ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd"|head -n3 I0526 05:46:29.770637 1 cvo.go:553] Desired version from spec is v1.Update{Version:"", Image:"registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd", Force:false} I0526 05:46:29.770673 1 sync_worker.go:761] Detected while considering cluster version generation 3: version changed (from {4.11.0-0.nightly-2022-05-25-123329 registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1 false} to { registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd false}) I0526 05:46:29.770679 1 sync_worker.go:244] syncPayload: registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd (force=false) # cat cvo.log|grep "The update cannot be verified"|head -n1 I0526 11:01:03.176815 1 sync_worker.go:397] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd against keyrings: verifier-public-key-redh
Setting this to medium severity bug as this will hit very less % of clusters and will not impact the availability of the cluster
When we separated payload load from payload apply (https://github.com/openshift/cluster-version-operator/pull/683) the context used for the retrieval changed as well. It went from one that was constrained by syncTimeout (2 -4 minutes) [1] to being the unconstrained shutdownContext [2]. However if "force" is specified we explicitly set a 2 minute timeout in RetrievePayload. We should create a new context with a reasonable timeout for RetrievePayload regardless of "force". Not sure what was going on during the 7+ hours with the connection and why it didn't timeout sooner, error or something. [1] https://github.com/openshift/cluster-version-operator/blob/57ffa7c610fb92ef4ccd9e9c49e75915e86e9296/pkg/cvo/sync_worker.go#L605 [2] https://github.com/openshift/cluster-version-operator/blob/57ffa7c610fb92ef4ccd9e9c49e75915e86e9296/pkg/cvo/cvo.go#L413
I did not see anything in the must-gather's CVO log to help us understand what was going on during the 7+ hours during the verification step of RetrievePayload. If you still have the complete CVO log file that contained the "The update cannot be verified" log entry referenced in Comment 5 could you attach it please. If not, would it be possible to repeat this same test and then attach the resulting CVO log. Thanks.
verifying on 4.13.0-0.nightly-2022-12-23-174810 to 4.13.0-0.nightly-2022-12-23-223710 disconnected private mirrored unsigned 4.13.0-0.nightly-2022-12-23-223710 into the registry sent upgrade command without force (time: 12:16:34) # oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23 monitored "oc adm upgrade" for changes. 4 minutes later (time: 12:20:36) received: Cluster version is 4.13.0-0.nightly-2022-12-23-174810 ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23" failure=The update cann ot be verified: unable to verify sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23 against keyrings: verifier-public-key-redhat Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=ea18d1b1-9c41-412c-8ff2-6f8d87bb99bd&version=4.1 3.0-0.nightly-2022-12-23-174810": dial tcp 54.234.217.153:443: connect: connection timed out in cvo log: I1226 12:16:36.486777 1 cvo.go:596] Desired version from spec is v1.Update{Architecture:"", Version:"", Image:"registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23", Force:false} I1226 12:16:36.486815 1 sync_worker.go:811] Detected while considering cluster version generation 3: version changed (from { 4.13.0-0.nightly-2022-12-23-174810 registry.ci.openshift.org/ocp/release@sha256:83765f4d554616fd80e1c833855f206c317731cbb498d6098c4897237e77086c false} to { registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23 false}) I1226 12:16:36.486822 1 sync_worker.go:261] syncPayload: registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23 (force=false) I1226 12:16:36.487123 1 event.go:285] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RetrievePayload' Retrieving and verifying payload version="" image="registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23" I1226 12:20:36.487304 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23 against keyrings: verifier-public-key-redhat in events 2m1s Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="" image="registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23" 2m1s Warning RetrievePayloadFailed clusterversion/version Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23" failure=The update cannot be verified: unable to verify sha256:76259072b4df3c6abbfc66f84795b94cd7bd632198783e4a24f8295ffd928f23 against keyrings: verifier-public-key-redhat as expected
oh great! made a typo while trying to mention Jack, and now bugzilla not letting me go edit the comment! anyway, the error was: ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:48f9326c2defcd5a801725246eb3efdc6d3a1013e6cee8f379 aa0a757ebfaba9" failure=Unable to download and prepare the update: client rate limiter Wait returned an error: context deadline exceeded
Per Evgeni's test result, seem like the fix PR introduce some critical regression issues, which will block disconnected cluster upgrade even with --force option, it is going to block many test cases, so raise its serverity.
ok tested on the same configuration from 4.13.0-0.nightly-2022-12-22-120609 to 4.13.0-0.nightly-2022-12-23-174810 (while 22-120609 is before the pr) and it is upgrading successfully! so indeed there is regression caused by the fix, to the outgoing cvo as it looks like
@evakhoni I finally managed to polish https://github.com/openshift/cluster-version-operator/pull/896 enough. I added unit tests that should simulate both the original (this) bug and the regression from https://bugzilla.redhat.com/show_bug.cgi?id=2090680#c14. Still, it seems this part of CVO tends to cause surprises, it'd be really great if you can give that PR a shot, targeting both the original bug and the regression.
@pmuller np! I'll try to pre-merge verify it, including regression testing.
pre-merge verifying using 4.13 cluster-bot build form https://github.com/openshift/cluster-version-operator/pull/896 mirrored unsigned build and applied upgrade without force date Mon 06 Feb 2023 12:45:48 IST ╰─ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae --allow-explicit-upgrade warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway Requested update to release image registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae ╰─ oc adm upgrade Cluster version is 4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest": dial tcp 44.199.160.73:443: connect: connection timed out ------------------ 2 minutes later -------- Mon 06 Feb 2023 12:47:49 IST Cluster version is 4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae" failure=The update cannot be verified: unable to verify sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae against keyrings: verifier-public-key-redhat Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest": dial tcp 44.199.160.73:443: connect: connection timed out as expected. in cvo log: I0206 10:45:47.783591 1 event.go:285] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RetrievePayload' Retrieving and verifying payload version="" image="registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae" I0206 10:47:47.783625 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae against keyrings: verifier-public-key-redhat events: oc get event -n openshift-cluster-version ......... 75s Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="" image="registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae" 75s Warning RetrievePayloadFailed clusterversion/version Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae" failure=The update cannot be verified: unable to verify sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae against keyrings: verifier-public-key-redhat regression testing: applying same upgrade target, without ImageContentSourcePolicy to target repository path: date Mon 06 Feb 2023 12:54:32 IST ╰─ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae --allow-explicit-upgrade --force warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Requested update to release image registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae ╰─ oc adm upgrade Cluster version is 4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae" failure=The update cannot be verified: unable to verify sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae against keyrings: verifier-public-key-redhat Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest": dial tcp 44.209.252.60:443: connect: connection timed out -------------------- after 5+ minutes Mon 06 Feb 2023 12:59:48 IST Cluster version is 4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae" failure=Unable to download and prepare the update: timed out waiting for the condition Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest": dial tcp 44.209.252.60:443: connect: connection timed out ---------------------- after another 4 minutes Mon 06 Feb 2023 13:03:48 IST Failing=True: Reason: UpdatePayloadFailed Message: Could not update serviceaccount "openshift-apiserver-operator/openshift-apiserver-operator" (445 of 833) Error while reconciling 4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest: the update could not be applied ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dl17ckk/release@sha256:a348e4ecf7d7dc175246aefb80db9711fc4737438822b90676dcb549ad3920ae" failure=Unable to download and prepare the update: timed out waiting for the condition Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest": dial tcp 44.199.160.73:443: connect: connection timed out ------------------------ updated ImageContentSourcePolicy with proper repository path Mon 06 Feb 2023 13:07:20 IST ╰─ oc edit ImageContentSourcePolicy image-policy-1 imagecontentsourcepolicy.operator.openshift.io/image-policy-1 edited ---------------------- <3 minutes later Mon 06 Feb 2023 13:10:02 IST info: An upgrade is in progress. Working towards 4.13.0-0.ci.test-2023-02-06-071146-ci-ln-dl17ckk-latest: 22 of 833 done (2% complete) Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest": context canceled upgrade is progressing as expected. another regression test: reverted and upgraded straight with ImageContentSourcePolicy already in place Mon 06 Feb 2023 15:05:47 IST ╰─ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:f650a0e99457836880f2b6a4f451084b612aaf2bf05f30946906f09903389c0f --allow-explicit-upgrade --force warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Requested update to release image registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:f650a0e99457836880f2b6a4f451084b612aaf2bf05f30946906f09903389c0f ╰─ oc adm upgrade Cluster version is 4.13.0-0.ci.test-2023-02-06-071146-ci-ln-dl17ckk-latest Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-071146-ci-ln-dl17ckk-latest": dial tcp 44.209.252.60:443: connect: connection timed out --------------- 2 minutes later: Mon 06 Feb 2023 15:07:48 IST info: An upgrade is in progress. Working towards 4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest: 9 of 833 done (1% complete) Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.12 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.12&id=080d4192-46bc-496e-876a-01c5ff6d4c4a&version=4.13.0-0.ci.test-2023-02-06-071146-ci-ln-dl17ckk-latest": dial tcp 44.199.160.73:443: connect: connection timed out upgrade is progressing as expected. @pmuller the fix looks successful, previous regression not reproduced as expected. however 2 questions: 1) Failing=True: Reason: UpdatePayloadFailed Message: Could not update serviceaccount "openshift-apiserver-operator/openshift-apiserver-operator" (445 of 833) is this an expected behavior while trying to upgrade to image while repository not in ImageContentSourcePolicy? I expected to see only ReleaseAccepted=False with "timed out waiting for the condition" 2) a 5:16 minutes delay between the two `payload failed version` (keyrings to timeout) and +another 4:00 minutes for the failing, is this an expected delay in 4.13?
Thanks for looking into this! We may be onto something interesting here. This is the way I'm reading the timeline of the second scenario. It shows some unexpected behavior but likely unrelated to the PR, maybe we are seeing a new bug, or possibly a behavior that is "unlocked" by us proceeding further into the scenario after the fix. - Start: Cluster is at <ci-ln-dkrq3mk> - ~12:54:32 IST: oc adm upgrade --to-image <ci-ln-dl17ckk> --allow-explicit-upgrade --force - ~12:54:32 IST: oc adm upgrade shows ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed image="<ci-ln-dl17ckk>" failure=The update cannot be verified... --> This ReleaseAccepted=False is there as a remnant of the previous test scenario (I assume) - Start + ~5m - ~12:59:48 IST: Cluster is (still) at <ci-ln-dkrq3mk> - ~12:59:48 IST: oc adm upgrade shows ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed image="<ci-ln-dl17ckk>" failure=Unable to download and prepare the update... --> So after this ~5m, we already went through the 1) try to verify and timeout 2) proceed because --force 3) try to download and timeout --> IMO, the actual test scenario ends here. We attempted to upgrade, it failed to download, the cluster should go back to business as usual until the download is fixed or upgrade aborted etc. We expected nothing else to happen (right?). --> But you continued, and things got potentially interesting: - Start + ~9m - ~13:03:48 IST: oc adm upgrade shows Failing=True Reason: UpdatePayloadFailed Message: Could not update serviceaccount "openshift-apiserver-operator/openshift-apiserver-operator" (445 of 833) --> This is the interesting part: "Error while reconciling 4.13.0-0.ci.test-2023-02-06-062603-ci-ln-dkrq3mk-latest: the update could not be applied" ^^^^^^^^^^^^^ --> This is the *current* cluster version, not the update target one. This is not the cluster somehow trying to actually update to the new version. I talked with Lala today and he suspects that the "fail to verify but force, fail to download" scenario can result in CVO running in a confused state / wrong mode where it attempts to forcibly "upgrade" into the current version, which would be a yet undiscovered regression from a past change in CVO. --- So to answer your questions: 1) Is `Could not update serviceaccount "openshift-apiserver-operator/openshift-apiserver-operator` an expected behavior while trying to upgrade to image while repository not in ImageContentSourcePolicy? No, but this happens *after* we try to upgrade to the undownloadable image, fail, and see the expected behavior (correct message in the ReleaseAccepted=False condition). It fails during reconciliation to the *current* payload, which is definitely suspicious. 2)a 5:16 minutes delay between the two `payload failed version` (keyrings to timeout) and +another 4:00 minutes for the failing, is this an expected delay in 4.13? The ~5m delay is fine, we saw what we wanted after these 5m. The failure afterwards should not happen, so it is not part of the "expected delay". And to propose next steps: 1) It would be useful to see the CVO log from the second scenario, to see 1) why was CVO reconciling to the current version manifest for that OAO ServiceAccount 2) what was the actual failure that CVO saw 2) It would be useful to test whether this behavior is deterministic (can you try to reproduce it?) 3) It would be useful to test this scenario with a past OCP version before a *certain* change: I dont know which one but Lala knows so I'm NEEDINFOing him.
ok I have reproduced again in build from the same PR installed ipi-on-aws disconnected private cluster Server Version: 4.13.0-0.ci.test-2023-02-08-064507-ci-ln-vtrvh42-latest patched cco (needed cause this one using manual mode cloud credentials) ╰─ oc patch cloudcredential.operator.openshift.io/cluster --patch '{"metadata":{"annotations": {"cloudcredential.openshift.io/upgradeable-to": "99.99"}}}' --type=merge cloudcredential.operator.openshift.io/cluster patched upgrade without force ╰─ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca --allow-explicit-upgrade Wed 08 Feb 2023 13:12:57 IST ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca" failure=The update cannot be verified: unable to verify sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca against keyrings: verifier-public-key-redhat with force ╰─ oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca --allow-explicit-upgrade --force Wed 08 Feb 2023 13:18:49 IST Cluster version is 4.13.0-0.ci.test-2023-02-08-064507-ci-ln-vtrvh42-latest ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca" failure=The update cannot be verified: unable to verify sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca against keyrings: verifier-public-key-redhat Wed 08 Feb 2023 13:22:58 IST Cluster version is 4.13.0-0.ci.test-2023-02-08-064507-ci-ln-vtrvh42-latest ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca" failure=Unable to download and prepare the update: timed out waiting for the condition Wed 08 Feb 2023 13:26:57 IST Failing=True: Reason: UpdatePayloadFailed Message: Could not update role "openshift-marketplace/marketplace-operator" (648 of 834) Error while reconciling 4.13.0-0.ci.test-2023-02-08-064507-ci-ln-vtrvh42-latest: the update could not be applied ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca" failure=Unable to download and prepare the update: timed out waiting for the condition
OK I think I know what is going on. It seems that the 2m/4m timeouts in `RetrievePayload` are starving out the actual manifest reconciliation operations *somehow*, which makes the whole loop iteration take too much time and eventually get canceled: 115514:I0208 11:10:57.056651 1 sync_worker.go:261] syncPayload: registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca (force=false) 115527:I0208 11:12:57.057460 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca against keyrings: verifier-public-key-redhat 115539:I0208 11:12:57.070591 1 sync_worker.go:261] syncPayload: registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca (force=false) 115558:I0208 11:14:57.071738 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca against keyrings: verifier-public-key-redhat ^^^ These are the two 2m-timeout syncPayload runs (this is before --force was issued around 11:18:49Z) 115563:I0208 11:14:57.072226 1 sync_worker.go:888] apply: 4.13.0-0.ci.test-2023-02-08-064507-ci-ln-vtrvh42-latest on generation 3 in state Reconciling at attempt 0 115564:I0208 11:14:57.074037 1 task_graph.go:477] Running 0 on worker 1 115565:I0208 11:14:57.074057 1 task_graph.go:477] Running 2 on worker 1 115566:I0208 11:14:57.074075 1 sync_worker.go:993] Running sync for namespace "openshift-cluster-storage-operator" (253 of 834) 115567:I0208 11:14:57.074130 1 task_graph.go:477] Running 1 on worker 0 115568:I0208 11:14:57.078122 1 sync_worker.go:1013] Done syncing for namespace "openshift-cluster-storage-operator" (253 of 834) 115569:I0208 11:14:57.078149 1 task_graph.go:477] Running 3 on worker 1 115571:I0208 11:14:57.082282 1 sync_worker.go:993] Running sync for namespace "openshift-apiserver-operator" (441 of 834) 115573:I0208 11:14:57.085137 1 sync_worker.go:993] Running sync for namespace "openshift-marketplace" (645 of 834) 115574:I0208 11:14:57.089063 1 sync_worker.go:1013] Done syncing for namespace "openshift-apiserver-operator" (441 of 834) 115575:I0208 11:14:57.089169 1 sync_worker.go:993] Running sync for openshiftapiserver "cluster" (442 of 834) 115576:I0208 11:14:57.089221 1 sync_worker.go:1013] Done syncing for namespace "openshift-marketplace" (645 of 834) 115577:I0208 11:14:57.089299 1 sync_worker.go:993] Running sync for serviceaccount "openshift-marketplace/marketplace-operator" (646 of 834) ^^^ CVO starts reconciling manifests, gets some stuff done 115586:I0208 11:14:57.092496 1 sync_worker.go:261] syncPayload: registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca (force=false) 115591:I0208 11:14:57.093052 1 sync_worker.go:1013] Done syncing for serviceaccount "openshift-marketplace/marketplace-operator" (646 of 834) 115606:I0208 11:16:57.093557 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca against keyrings: verifier-public-key-redhat ^^^ Another 2m timeout 115609:I0208 11:16:57.093889 1 sync_worker.go:993] Running sync for clusterrole "marketplace-operator" (647 of 834) ^^^ We haven't started any manifest sync while we waited for 2m inside `syncPayload` 115611:I0208 11:16:57.093962 1 sync_worker.go:1013] Done syncing for openshiftapiserver "cluster" (442 of 834) 115612:I0208 11:16:57.094051 1 sync_worker.go:993] Running sync for configmap "openshift-apiserver-operator/openshift-apiserver-operator-config" (443 of 834) 115624:I0208 11:16:57.105504 1 sync_worker.go:261] syncPayload: registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca (force=false) 115628:I0208 11:16:57.109993 1 sync_worker.go:1013] Done syncing for configmap "openshift-apiserver-operator/openshift-apiserver-operator-config" (443 of 834) 115641:I0208 11:18:57.105749 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca against keyrings: verifier-public-key-redhat ^^^ Another 2m timeout 115645:I0208 11:18:57.105911 1 sync_worker.go:993] Running sync for configmap "openshift-apiserver-operator/trusted-ca-bundle" (444 of 834) 115649:I0208 11:18:57.106115 1 sync_worker.go:1013] Done syncing for clusterrole "marketplace-operator" (647 of 834) ^^^ Again we did not start syncing while waiting in syncPayload. Also, for some reason the marketplace-operator ClusterRole took 2m to sync as well (maybe we log before we start to wait?) 115651:I0208 11:18:57.106448 1 sync_worker.go:261] syncPayload: registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca (force=true) 115720:I0208 11:22:57.107405 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=Unable to download and prepare the update: timed out waiting for the condition ^^^ This ran after --force so it took 4m to finish instead of 2m 115724:I0208 11:22:57.107819 1 sync_worker.go:993] Running sync for role "openshift-marketplace/marketplace-operator" (648 of 834) ^^^ So this waited for 4m until it was allowed to start. 115725:I0208 11:22:57.107887 1 task_graph.go:474] Canceled worker 1 while waiting for work 115726:I0208 11:22:57.107900 1 sync_worker.go:1013] Done syncing for configmap "openshift-apiserver-operator/trusted-ca-bundle" (444 of 834) 115727:I0208 11:22:57.107911 1 task_graph.go:474] Canceled worker 0 while waiting for work ^^^ Something got fed up waiting 115728:I0208 11:22:57.107918 1 task_graph.go:523] Workers finished 115729:I0208 11:22:57.107925 1 task_graph.go:546] Result of work: [update context deadline exceeded at 8 of 834 Could not update role "openshift-marketplace/marketplace-operator" (648 of 834)] ^^^ This was canceled mid-flight? 115744:I0208 11:22:57.117233 1 sync_worker.go:261] syncPayload: registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca (force=true) 115814:I0208 11:26:57.117407 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=Unable to download and prepare the update: timed out waiting for the condition 115829:I0208 11:26:57.128766 1 sync_worker.go:261] syncPayload: registry.build05.ci.openshift.org/ci-ln-dkrq3mk/release@sha256:0ce4ae5cb1eade573f95edae6c2cec05580f94609d02cd3bf85b450be0d362ca (force=true) I have put an annotated log with further notes here: https://gist.github.com/petr-muller/6ca65f2a3be175937e0a558313b23957 I think this is not a new bug, just the timeouts were never set correct enough for the CVO not be in a more buggy state when it cannot verify and download the image.
The above is Evgeni's CVO log grepped with (with stripped lines before 11:10:00 that are irrelevant): $ grep -e 'apply: ' -e 'Running sync for' -e 'Done syncing for' -e syncPayload -e 'task_graph.go' cluster-version-operator-688849d8d9-8hwnp.log
I have investigated the concerns from the above and filed https://issues.redhat.com/browse/OCPBUGS-7714 for the problem. That behavior is just "uncovered" by this bugfix, not caused by it. The fix actually improves the state for the clusters that hit it - I believe that previously the manifest reconciliation was completely broken while the unbounded retrieval was timing out for hours, we just did not notice.
verifying again since there's a commit merged after the last pre-merge verification. 4.13.0-0.nightly-2023-02-20-182956 to 4.13.0-0.nightly-2023-02-21-014524 disconnected private on aws ❯ date && oc adm upgrade Tue 21 Feb 2023 19:02:41 IST Cluster version is 4.13.0-0.nightly-2023-02-20-182956 Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.13 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.13&id=d1588271-2bff-4b23-835c-b23446f3845e&version=4.13.0-0.nightly-2023-02-20-182956": dial tcp 34.227.168.110:443: connect: connection timed out mirrored and sent upgrade command without force: ❯ date && oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 Tue 21 Feb 2023 19:05:34 IST warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway Requested update to release image registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 -----------2 minutes later-------------- Tue 21 Feb 2023 19:07:36 IST Cluster version is 4.13.0-0.nightly-2023-02-20-182956 ReleaseAccepted=False Reason: RetrievePayload Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031" failure=The update cannot be verified: unable to verify sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 against keyrings: verifier-public-key-redhat Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.13 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.13&id=d1588271-2bff-4b23-835c-b23446f3845e&version=4.13.0-0.nightly-2023-02-20-182956": dial tcp 34.227.168.110:443: connect: connection timed out as expected. in events: 90s Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="" image="registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031" 90s Warning RetrievePayloadFailed clusterversion/version Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031" failure=The update cannot be verified: unable to verify sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 against keyrings: verifier-public-key-redhat cvo log: I0221 17:05:34.855322 1 cvo.go:598] Desired version from spec is v1.Update{Architecture:"", Version:"", Image:"registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031", Force:false} I0221 17:05:34.855373 1 sync_worker.go:811] Detected while considering cluster version generation 3: version changed (from { 4.13.0-0.nightly-2023-02-20-182956 registry.ci.openshift.org/ocp/release@sha256:9593a6e1b9455e9e388b0ae3d17a90abba4c0eba2c4e2db4184dadda412624de false} to { registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 false}) I0221 17:05:34.855382 1 sync_worker.go:261] syncPayload: registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 (force=false) I0221 17:05:34.855474 1 event.go:285] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RetrievePayload' Retrieving and verifying payload version="" image="registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031" I0221 17:07:34.855614 1 sync_worker.go:437] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 against keyrings: verifier-public-key-redhat as expected. regression testing: retrying with --force ❯ date && oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 --force Tue 21 Feb 2023 19:08:50 IST warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Requested update to release image registry.ci.openshift.org/ocp/release@sha256:ddb6974a1afe869e79bbaf25fe125e23adc38b552e44a105983fb41fdfc09031 -----------3 minutes later------------------ Tue 21 Feb 2023 19:11:48 IST info: An upgrade is in progress. Working towards 4.13.0-0.nightly-2023-02-21-014524: 20 of 838 done (2% complete) Upstream is unset, so the cluster will use an appropriate default. Channel: stable-4.13 warning: Cannot display available updates: Reason: RemoteFailed Message: Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.13&id=d1588271-2bff-4b23-835c-b23446f3845e&version=4.13.0-0.nightly-2023-02-20-182956": context canceled upgrade is progressing as expected.as expected
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326