Hide Forgot
Disconnected openshift cli based upgrades seem to be failing more regularly now. For the disconnected environments there's no access on the cluster to the api.openshift.com endpoint, mirror.openshift.com, or the storage.googleapis.com to load the signatures so we run the upgrade against the sha with the --force flag and the --allow-explicit-upgrade option: # oc version Client Version: 4.6.6 Server Version: 4.6.1 Kubernetes Version: v1.19.0+d59ce34 # oc adm upgrade --allow-explicit-upgrade --force --allow-upgrade-with-warnings --to-image=quay.io/openshift-release-dev/ocp-release@sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39 in the past this would proceed to grab to the image, extract the version tag and then proceed with the remainder of the upgrade logic - what we're seeing now (4.5.2+ and 4.6.z) is the upgrade will fail somewhere between downloading the image and loop with a "failed to download" message. The CVO log complains that it cannot validate the signature (as it did in the past) and seems to loop after downloading the update complaining that the download failed: I1209 15:50:51.579895 1 cvo.go:406] Started syncing cluster version "openshift-cluster-version/version" (2020-12-09 15:50:51.579874617 +0000 UTC m=+1375900.207690373) I1209 15:50:51.579991 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66 c8300662ef30563d7104f39", Force:true} I1209 15:50:51.580041 1 sync_worker.go:222] Update work is equal to current target; no change required I1209 15:50:51.580057 1 status.go:159] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:12, Step:"RetrievePayload", Failure:error(nil), Fraction:0, Complete d:0, Reconciling:false, Initial:false, VersionHash:"", LastProgress:time.Time{wall:0xbfec5a2365dd8863, ext:1375566263091063, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"", I mage:"quay.io/openshift-release-dev/ocp-release@sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39", URL:"", Channels:[]string(nil)}, Verified:false} I1209 15:50:51.580163 1 status.go:79] merge into existing history completed=false desired=v1.Release{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:c7e8f18e8116356 701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39", URL:"", Channels:[]string(nil)} last=&v1.UpdateHistory{State:"Partial", StartedTime:v1.Time{Time:time.Time{wall:0x0, ext:63743125521, loc: (*time.Location)(0x26b0400)}}, CompletionTime:(*v1.Time)(nil), Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d71 04f39", Verified:false} I1209 15:50:51.580293 1 cvo.go:408] Finished syncing cluster version "openshift-cluster-version/version" (416.053µs) I1209 15:50:55.260829 1 leaderelection.go:273] successfully renewed lease openshift-cluster-version/version I1209 15:50:59.548377 1 sigstore.go:95] unable to load signature: Get "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=c7e8f18e8116356701bd23ae3a23fb9 892dd5ea66c8300662ef30563d7104f39/signature-1": context deadline exceeded I1209 15:50:59.548369 1 sigstore.go:95] unable to load signature: Get "https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=c7e8f18e8116356701b d23ae3a23fb9892dd5ea66c8300662ef30563d7104f39/signature-1": dial tcp 172.217.15.112:443: i/o timeout I1209 15:50:59.548432 1 verify.go:154] error retrieving signature for sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39: Get "https://mirror.openshift.com/pub/ope nshift-v4/signatures/openshift/release/sha256=c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39/signature-1": context deadline exceeded I1209 15:50:59.548451 1 verify.go:173] Failed to retrieve signatures for sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39 (should never happen) W1209 15:50:59.548458 1 updatepayload.go:100] An image was retrieved from "quay.io/openshift-release-dev/ocp-release@sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d710 4f39" that failed verification: The update cannot be verified: context deadline exceeded W1209 15:50:59.548588 1 updatepayload.go:206] failed to prune jobs: context deadline exceeded E1209 15:50:59.548712 1 sync_worker.go:348] unable to synchronize image (waiting 2m50.956499648s): Unable to download and prepare the update: context deadline exceeded I1209 15:50:59.548767 1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' retrieving payload failed version="" image="quay.io/openshift-release-dev/ocp-release@sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39" failure=Unable to download and prepare the update: context deadline exceeded I1209 15:51:05.345520 1 leaderelection.go:273] successfully renewed lease openshift-cluster-version/version Have validated we can pull the images fine with podman on the host, as well as on the nodes of the cluster through the image content source policy. Have also validated that we can pull all the images within the update in the same manor. Have also tried putting the signature in a configmap in the openshift-config-managed namespace and running without the --force flag - but do not have a validated procedure for this.
that should read 4.5.21+ and 4.6.1+ .. earlier 4.5 upgrades in this manner were okay.
>I1209 15:50:59.548432 1 verify.go:154] error retrieving signature for sha256:c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39: Get "https://mirror.openshift.com/pub/ope nshift-v4/signatures/openshift/release/sha256=c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39/signature-1": context deadline exceeded Your cluster can't reach https://mirror.openshift.com to verify the signature
you must have access to mirror.openshift.com to pass this step - or have "force" it skip in clusterversion. Please attach must-gather
The fact that your cluster is trying to pull signatures over HTTPS at all suggests you may have fumbled something when you attempted to add the signature ConfigMap to the openshift-config-managed namespace. Please provide steps for how you attempted that, or we can try to reconstruct based on the must-gather Vadim requested in comment 3.
Thanks - yes, reattempting the openshift-config-managed ConfigMap seems to have worked fine as per: https://docs.openshift.com/container-platform/4.6/updating/updating-restricted-network-cluster.html#updating-restricted-network-image-signature-configmap # cat<<EOF | oc create -f - apiVersion: v1 kind: ConfigMap metadata: name: release-image-4.6.6 namespace: openshift-config-managed labels: release.openshift.io/verification-signatures: "" binaryData: sha256-c7e8f18e8116356701bd23ae3a23fb9892dd5ea66c8300662ef30563d7104f39: owGbwMvMwMEoOU9/4l9n2UDGtYxJSWLxRQW5xZnpukWphboebpUmlY56SZl58fvXTahWSi7KLMlMTsxRslKoVsrMTUxPBbNS8pOzU4t0cxPzMtNSi0t0UzLTgRRQSqk4I9HI1Mwq2TzVIs3QItXC0NDM2NTM3MAwKcXIODHVONHIOC3J0sLSKCXFNDXRzCzZwtjAwMzMKDXN2MDUzDjF3NDAJM3YUqlWR0GppLIAZJ1SYkl+bmayQnJ+XkliZl5qkQLQtXmJJaVFqUpAVZkpqXklmSWVyA4rSk1LLUrNSwZrLyxNrNTLzNfPL0jNK87ITCsBSuekJhan6qaklunnJxfA+FYmemZ6ZroVFmbxZiZKtSAn5BeUZObnQf2fXJQKdEoRyMyg1BQFj8QSBX+gmcEgMxWCgW7KzEtXcCwtycgHhlqlgoGegZ4h0JhOJlFmVgZQeMLDnWNzCv9vdt9TPXOTrDSnc2wvXpY1X/6H/UrLipMBFXU3s7i31tyP/8+h7PhRcUk2z1ePBWf782+rrGRPZ5g9b7l8XkX7hD2/Tq66+Erix9WW/woHup/9DGV/+rbxbBPb17+/vsypOeeaEnutcMEzBu6j/yXn5c0+Iu+dsyXj80eH2tNMGb+t426tKtBLStLsXmy/5YGq1cUvH7qP/5ptkx2WcGP95I3yYlOecGhPWWtxkGG19N59ERZ/w53O7Ko69LVI73jP5KCH2t4yKiKPPossvZOwzHDOftXukIVL0hTVBNWNjm75lNi/Kj7BYhLH/f47j7gqrn3jrv5y8crc57GSEa8WTe/6wfvFS3u5yEnDB4Gu2W+sfiZcVdFvvGmnNI83RPvQMf3f5yd8SzBSYI10mPCvPq6k+bvtnexXRl1WqhcPf1y5SJLxv67kLm+nzdfdu/pD/Qz3RuybWJj48+b9z9zN68o0depFjigWVDz4vaU8a3LtoTS+NcGzolfclHdhm3hJ5YuJfI/87Ycnf5W9cjpjFrr4eb/FmzNHEsP7WF/Efo51nP52vqW3sk/p2TzF8zkZV1fqtu56dzzxdS3XZYb/u2R99Y8JnP8d3yz9acWOiCgnHtarra7GTD+U9wl/YQ33Cnt05ea7P2yK/8OcZl9YICyonbhx1uLSoAMzAoIN5iVVcR2zKCnNk+8wqVtjIPwsYLUsy0XGlvWx8WV3AA== EOF # oc get cm -n openshift-config-managed NAME DATA AGE bound-sa-token-signing-certs 1 33d console-public 1 33d csr-controller-ca 1 33d default-ingress-cert 1 33d grafana-dashboard-cluster-total 1 33d grafana-dashboard-etcd 1 33d grafana-dashboard-k8s-resources-cluster 1 33d grafana-dashboard-k8s-resources-namespace 1 33d grafana-dashboard-k8s-resources-node 1 33d grafana-dashboard-k8s-resources-pod 1 33d grafana-dashboard-k8s-resources-workload 1 33d grafana-dashboard-k8s-resources-workloads-namespace 1 33d grafana-dashboard-node-cluster-rsrc-use 1 33d grafana-dashboard-node-rsrc-use 1 33d grafana-dashboard-prometheus 1 33d kube-apiserver-aggregator-client-ca 1 33d kube-apiserver-client-ca 1 33d kube-apiserver-server-ca 1 33d kubelet-bootstrap-kubeconfig 1 33d kubelet-serving-ca 1 33d monitoring-shared-config 4 33d oauth-openshift 1 33d ocp-upgrade-4.6.6 0 22h release-image-4.6.6 1 135m release-verification 3 33d sa-token-signing-certs 2 33d service-ca 1 33d signatures-managed 0 33d trusted-ca-bundle 1 33d -- can see the binary data as 0 in the previous ocp-upgrade-4.6.6 .. also is the name of the configmap significant? it seems though that the --force flag doesn't seem to be respected to bypass signature validation now. Is this by design? Will recommend moving to the ConfigMap method moving forward.
> ... is the name of the configmap significant? No, the cluster-version operator just hunts for the release.openshift.io/verification-signatures label in the openshift-config-managed namespace [1]. > it seems though that the --force flag doesn't seem to be respected to bypass signature validation now. The cluster-version operator should still run all the checks, e.g. attempting to hunt down valid signatures. But when a check fails, forcing should waive the failure and carry on with the update regardless. Maybe we have a bug there around context timeouts... [1]: https://github.com/openshift/library-go/blob/19c8a18cddcd49ee18b34531a18122f0e3844cfa/pkg/verify/store/configmap/configmap.go#L25-L33
I also hit the same issue here when upgrading a disconnected cluster from 4.6.9 to 4.7.0-0.nightly-2020-12-21-131655 with --force option.
Added upgrade-blocker keyword as it may block upgrade regression testing.
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? example: Up to 2 minute disruption in edge routing example: Up to 90seconds of API downtime example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? example: Issue resolves itself after five minutes example: Admin uses oc to fix things example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? example: No, it’s always been like this we just never noticed example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
As per the documentation we suggest the signature configmap route to update airgap clusters. So removing the upgrade blocker keyword and setting the severity to high.
As the documented steps works fine, reducing the severity to medium and priority to low.
For our internal testing, especially for those unsigned release image, force upgrade for an air-gapped cluster is still often required, so I think it is better to fix it ASAP.
Believe the context timeout is short-circuiting any further processing of the update that would normally occur with "force" true. Perhaps a child context is needed at [1] that can be handled there locally depending on value of force. [1] https://github.com/openshift/cluster-version-operator/blob/1e51a0e4750ca110d4659f33bce210a3de6844b9/pkg/cvo/updatepayload.go#L91
@lmohanty Does Johnny's input above answers your questions when you set the NeedInfo flag?
Jack's proposal in comment 14 makes sense to me. Picking the timeout for a child context sounds fiddly, but we could also pass down two Context arguments if we feel too jumpy making a local decision about how much time is on a single Context that got passed in. Probably worth working out the chain down from wherever is setting the current timeout before we pick where to set the child timeout.
Retest this bug with 4.7.0-0.nightly-2021-01-19-095812, still fail. 01-20 12:22:45 Command: oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315 --force 01-20 12:22:45 warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway 01-20 12:22:45 warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. 01-20 12:22:45 Updating to release image registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315 01-20 12:27:45 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 12:32:46 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 12:37:46 Status: Unable to apply registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: could not download the update Progress: True Available: True 01-20 12:42:47 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 12:47:48 Status: Unable to apply registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: could not download the update Progress: True Available: True 01-20 12:52:49 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 12:57:49 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 13:02:50 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 13:07:50 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 13:12:51 Status: Unable to apply registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: could not download the update Progress: True Available: True 01-20 13:17:51 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 13:22:52 Status: Unable to apply registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: could not download the update Progress: True Available: True 01-20 13:27:54 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True 01-20 13:32:54 Status: Working towards registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315: downloading update Progress: True Available: True
> Retest this bug with 4.7.0-0.nightly-2021-01-19-095812, still fail. Looks like you were trying to update from an unspecified release to 4.7.0-0.nightly-2021-01-19-095812: $ oc adm release info registry.ci.openshift.org/ocp/release@sha256:ac57098ad18ed07977b54b90be79dc44f34eb03e42e0be2a95963a316bcde315 | head -n1 Name: 4.7.0-0.nightly-2021-01-19-095812 But the bug fix needs to be in the outgoing release to matter, because it's the outgoing release that's trying to verify the desired target. Can you install 4.7.0-0.nightly-2021-01-19-095812 and then try to update out to some other release (doesn't really matter what the target is, as long as the target is accepted and a CVO is launched to start attempting to apply it).
I was doing upgrading from 4.6.9 to 4.7.0-0.nightly-2021-01-19-095812, so the outgoing release is 4.6.9. Per your statement, once 4.7 nightly build fixed this issue, we need to backport to 4.6?
This bug targets 4.7, so we should be able to verify as it stands with 4.7.0-0.nightly-2021-01-19-095812 -> whatever. Once this bug is VERIFIED, we can clone the bug back to 4.6.z to fix 4.6.(fixed) -> whatever. And then we may want to keep backporting to 4.5 and earlier, although 4.4 might go end-of-life before we get back that far [1]. Looking at the pkg/cvo/updatepayload.go history, this is not a regression. Although it's possible that 4.5->whatever, etc. are not vulnerable for some other reason. Would be good to check. [1]: https://access.redhat.com/support/policy/updates/openshift#dates
Reproduced this bug upgrading from 4.7.0-0.nightly-2021-01-13-124141 to 4.7.0-0.nightly-2021-01-19-095812. Verified this bug upgrading from 4.7.0-0.nightly-2021-01-18-053817 to 4.7.0-0.nightly-2021-01-19-095812.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633