Description of problem: It seems when installing or upgrading to OCP 4.6.9 (and maybe others), the verified flag in the CVO's history show verified is false. history: - completionTime: '2021-01-06T14:45:13Z' image: repo.ercot.com:443/ocp4/openshift4@sha256:43d5c84169a4b3ff307c29d7374f6d69a707de15e9fa90ad352b432f77c0cead startedTime: '2021-01-05T21:57:09Z' state: Completed verified: false version: 4.6.9 How reproducible: always Steps to Reproduce: 1. Upgrade cluster to 4.6.9 Actual results: verified = false Expected results: verified = true Additional info: Attempted to created 4.6.8 cluster in quick labs and it too showed not verified. history: - completionTime: "2021-01-12T20:39:40Z" image: quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf startedTime: "2021-01-12T20:05:46Z" state: Completed verified: false version: 4.6.8
Verification is something an outgoing CVO performs on an incoming update target request. On install, there is no outgoing CVO to perform the verification. Verification for the install must be performed by the user, e.g. by checking signatures on the installer they use to install the cluster, and then relying on transitive trust for the rest of the cluster (e.g. that the signed installer is using a compiled-in, by-digest release pullspec). So 'verified: false' for the original, installed-version history entry is as designed, not a bug.
Hello, Ok, the install clean makes sense. Customer upgraded from 4.5.21 to 4.6.9 and the verified was false. Is there a way to fix this or a way to determine why it is false from that upgrade? Thanks, Neil Girard
> Customer upgraded from 4.5.21 to 4.6.9 and the verified was false. Is there a way to fix this or a way to determine why it is false from that upgrade? 'verified' is set false when the outgoing CVO performs its validation checks (target release was signed, Upgradeable=False conditions were ok, etc.), at least one of those checks failed, but the user set --force to waive the failed guard (more background in bug 1825396). We do not record the specific guard(s) which failed and were waived. If the user used 'oc adm upgrade --force ...', or otherwise set ClusterVersion's spec.desiredUpdate.force [1], then ask them to try again without forcing, and the CVO will explain why it is refusing the requested update. [1]: https://github.com/openshift/api/blob/790e0a84e3e01dc3f6218e655b0e92b809ae7fd0/config/v1/types_cluster_version.go#L238-L250
Hello, Customer has stated they did not use the --force when upgrading, but somehow the update has resulted in the flag being false. Thanks, Neil Girard
Can you set a tail on the outgoing CVO, trigger the update, and then attach the CVO logs here? Example of tailing the CVO in [1]: $ POD="$(oc -n openshift-cluster-version get -o jsonpath='{.items[].metadata.name}{"\n"}' pods)" $ oc -n openshift-cluster-version logs -f "${POD}" > "$(date --utc --iso=s)-${POD}.log" although you may be able to do something more elegant than POD with: $ oc -n openshift-cluster-version logs -f --selector k8s-app=cluster-version-operator That should have the CVO explaining why it is failing to validate the requested update target. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1884334#c16
Hello, Customer is already at the level they wish to be. Is it possible to trigger upgrading to the same version? I can save the scripts for the next time the customer wishes to upgrade and have them try it at that time. Thanks, Neil Girard
> Is it possible to trigger upgrading to the same version? I installed a 4.5.21 cluster, and then bumped the channel to stable-4.6: $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "stable-4.6"}]' *->4.6.9 has been blocked [1], but 4.6.8 is still currently recommended: $ oc adm upgrade Cluster version is 4.5.21 Updates: VERSION IMAGE 4.5.22 quay.io/openshift-release-dev/ocp-release@sha256:38d0bcb5443666b93a0c117f41ce5d5d8b3602b411c574f4e164054c43408a01 4.5.23 quay.io/openshift-release-dev/ocp-release@sha256:368970a5052e600b467de7ab2868a04cdd6cdb991eb9dfbf3473a199d6fe4a4e 4.5.24 quay.io/openshift-release-dev/ocp-release@sha256:f3ce0aeebb116bbc7d8982cc347ffc68151c92598dfb0cc45aaf3ce03bb09d11 4.6.8 quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf Start tailing the CVO in the background: $ oc -n openshift-cluster-version logs -f --selector k8s-app=cluster-version-operator >cvo.log & Trigger the update to 4.6.8: $ oc adm upgrade --to 4.6.8 Wait until it accepts the update and gets into it: $ oc adm upgrade info: An upgrade is in progress. Working towards 4.6.8: 11% complete No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss. Check the history: $ oc get -o json clusterversion version | jq -r '.status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .version + " " + (.verified | tostring)' 2021-01-19T21:17:30Z - 4.6.8 false 2021-01-19T20:50:19Z 2021-01-19T21:14:15Z 4.5.21 false So yeah, I'd have expected that 4.6.8 target to be 'verified: true', but it's not. Checking the pod and verification history for my cluster: $ oc -n openshift-cluster-version get -o json events | jq -r '.items[] | select(.reason == "SuccessfulCreate" or (.reason | contains("VerifyPayload"))) | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort 2021-01-19T20:49:53Z 1 Created pod: cluster-version-operator-844c77584d-8frtb 2021-01-19T20:50:26Z 1 Created pod: cluster-version-operator-59bc479b54-7bp7b 2021-01-19T21:17:18Z 1 Created pod: version-4.6.8-c49dj-zn6bm 2021-01-19T21:17:39Z 1 Created pod: cluster-version-operator-86b768cb58-v96lm 2021-01-19T21:17:51Z 1 verifying payload version="4.6.8" image="quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf" 2021-01-19T21:17:57Z 1 Created pod: cluster-version-operator-56ffb56bdc-r9d2s 2021-01-19T21:17:59Z 1 verifying payload version="4.6.8" image="quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf" Hmm. I grabbed logs from the currently-running CVO: $ POD="$(oc -n openshift-cluster-version get -o jsonpath='{.items[].metadata.name}{"\n"}' pods)" $ echo "${POD}" cluster-version-operator-56ffb56bdc-r9d2s $ oc -n openshift-cluster-version logs "${POD}" > "$(date --utc --iso=s)-${POD}.log" And checking its start time and version: $ oc -n openshift-cluster-version get -o json pod "${POD}" | jq -r '.metadata.creationTimestamp + " " + .spec.containers[0].image' 2021-01-19T21:17:57Z quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf $ oc adm release info quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf | head -n1 Name: 4.6.8 $ grep sync_worker '2021-01-19T21:23:51+00:00-cluster-version-operator-56ffb56bdc-r9d2s.log' | grep -v 'Update work is equal to current target; no change required' | grep -A2 'Loading payl oad' I0119 21:17:59.916165 1 sync_worker.go:502] Loading payload I0119 21:18:00.873910 1 sync_worker.go:569] Skipping preconditions for a local operator image payload. I0119 21:18:00.874021 1 sync_worker.go:602] Payload loaded from quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf with hash BYHEg9BBpfk= We probably shouldn't be recording a VerifyPayload event if we're about to skip verification because of the local payload. I'll run a fresh update and loop the tail, to see if I can get logs from the intermediate pods. [1]: https://github.com/openshift/cincinnati-graph-data/pull/603
Created attachment 1748863 [details] 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log Ok, take two: $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "stable-4.6"}]' $ while sleep 1; do POD="$(oc -n openshift-cluster-version get -o jsonpath='{.items[].metadata.name}{"\n"}' pods)" && oc -n openshift-cluster-version logs -f "${POD}" > "$(date --utc --iso=s)-${POD}.log"; done & $ oc adm upgrade --to 4.6.8 Wait for the CVO to get into the update... Confirm it's progressing: $ oc adm upgrade info: An upgrade is in progress. Working towards 4.6.8: 11% complete No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss. Kill the background logger: $ fg while sleep 1; do POD="$(oc -n openshift-cluster-version get -o jsonpath='{.items[].metadata.name}{"\n"}' pods)" && oc -n openshift-cluster-version logs -f "${POD}" > "$(date --utc --iso=s)-${POD}.log"; done ^C Confirm 4.6.8 was not verified (this is the bug): $ oc get -o json clusterversion version | jq -r '.status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .version + " " + (.verified | tostring)' 2021-01-19T22:23:59Z - 4.6.8 false 2021-01-19T21:53:38Z 2021-01-19T22:19:33Z 4.5.21 false History of pods and verification events: $ oc -n openshift-cluster-version get -o json events | jq -r '.items[] | select(.reason == "SuccessfulCreate" or (.reason | contains("VerifyPayload"))) | .firstTimestamp + " " + (.count | tostring) + " " + .message' | sort 2021-01-19T21:53:12Z 1 Created pod: cluster-version-operator-844c77584d-47v82 2021-01-19T21:53:45Z 1 Created pod: cluster-version-operator-59bc479b54-h54zb 2021-01-19T22:23:59Z 1 Created pod: version-4.6.8-hdlrb-hj59t 2021-01-19T22:24:17Z 1 Created pod: cluster-version-operator-86b768cb58-26cv6 2021-01-19T22:24:20Z 1 verifying payload version="4.6.8" image="quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf" 2021-01-19T22:24:23Z 1 Created pod: cluster-version-operator-8546dd559-6zxmp 2021-01-19T22:24:25Z 1 verifying payload version="4.6.8" image="quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf" Gathered logs: $ head -n1 2021-01-19T*.log ==> 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log <== I0119 22:15:04.457399 1 start.go:21] ClusterVersionOperator v4.5.0-202011132127.p0-dirty ==> 2021-01-19T22:24:17+00:00-cluster-version-operator-86b768cb58-26cv6.log <== I0119 22:24:19.133039 1 start.go:21] ClusterVersionOperator 4.6.0-202012050130.p0-39a4256 ==> 2021-01-19T22:24:23+00:00-cluster-version-operator-8546dd559-6zxmp.log <== I0119 22:24:24.592206 1 start.go:21] ClusterVersionOperator 4.6.0-202012050130.p0-39a4256 Not sure why we don't have logs for cluster-version-operator-844c77584d-47v82... Ah, never got scheduled: $ oc -n openshift-cluster-version get -o json events | jq -r '.items[] | select(.involvedObject.name == "cluster-version-operator-844c77584d-47v82") | .firstTimestamp + " " + (.count | tostring) + " " + .reason + ": " + .message' | sort 2021-01-19T21:53:12Z 2 FailedScheduling: no nodes available to schedule pods 2021-01-19T21:53:45Z 1 FailedScheduling: skip schedule deleting pod: openshift-cluster-version/cluster-version-operator-844c77584d-47v82 That's weird, but seems orthogonal. Log entries discussing verification: $ grep sync_worker 2021-01-19T*.log | grep -v 'Update work is equal to current target; no change required' | grep -A2 'Loading payload' 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log:I0119 22:19:03.311946 1 sync_worker.go:477] Loading payload 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log:I0119 22:19:03.806839 1 sync_worker.go:518] Skipping preconditions for a local operator image payload. 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log:I0119 22:19:03.806867 1 sync_worker.go:547] Payload loaded from registry.ci.openshift.org/ocp/release@sha256:7684a3daa92827b313a682b3615f2a348a941f8557b5b509b736226f40e01e07 with hash YkD2j0aj5ug= -- 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log:I0119 22:23:59.787724 1 sync_worker.go:477] Loading payload 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log:I0119 22:24:15.636222 1 sync_worker.go:547] Payload loaded from quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf with hash BYHEg9BBpfk= 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log:I0119 22:24:15.649806 1 sync_worker.go:639] Precreated resource clusteroperator "etcd" (68 of 617) -- 2021-01-19T22:24:17+00:00-cluster-version-operator-86b768cb58-26cv6.log:I0119 22:24:20.182443 1 sync_worker.go:502] Loading payload 2021-01-19T22:24:17+00:00-cluster-version-operator-86b768cb58-26cv6.log:I0119 22:24:20.768077 1 sync_worker.go:569] Skipping preconditions for a local operator image payload. 2021-01-19T22:24:17+00:00-cluster-version-operator-86b768cb58-26cv6.log:I0119 22:24:20.768136 1 sync_worker.go:602] Payload loaded from quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf with hash BYHEg9BBpfk= -- 2021-01-19T22:24:23+00:00-cluster-version-operator-8546dd559-6zxmp.log:I0119 22:24:25.249971 1 sync_worker.go:502] Loading payload 2021-01-19T22:24:23+00:00-cluster-version-operator-8546dd559-6zxmp.log:I0119 22:24:25.839554 1 sync_worker.go:569] Skipping preconditions for a local operator image payload. 2021-01-19T22:24:23+00:00-cluster-version-operator-8546dd559-6zxmp.log:I0119 22:24:25.839637 1 sync_worker.go:602] Payload loaded from quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf with hash BYHEg9BBpfk= So you can see cluster-version-operator-59bc479b54-h54zb verifying itself with the local-image out at 22:19:03 and then loading the 4.6.8 target at 22:23:59 without being all that chatty about any verification. The subsequent two CVO pods are from 4.6.8, so they both load while skipping verification with the local-image out. I'm attaching 2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log in case folks want to poke around there, while I go off and re-read the 4.5 verification code to see what sort of logging I should expect to find...
The VerifyPayload* events are new in 4.6 [1], so that's why we don't see them in 4.5. In 4.5.21, verified-ness comes from RetrievePayload [2]. Looks like we don't log anything on successful signature validation in RetrievePayload [3], but I don't see anything about "that failed verification" in the logs either. You can see a bit of the signature verification going on though: $ grep -1 'store.go' '2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log' I0119 22:23:59.787724 1 sync_worker.go:477] Loading payload I0119 22:23:59.787746 1 store.go:74] use cached most recent signature config maps I0119 22:23:59.797665 1 store.go:65] remember most recent signature config maps: signatures-managed I0119 22:23:59.797699 1 store.go:116] searching for sha256-6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf in signature config map signatures-managed I0119 22:23:59.918826 1 cvo.go:500] Finished syncing available updates "openshift-cluster-version/version" (131.142709ms) And while pkg/verify/verify.go includes logging for some failure modes, it has none for successfully verifying the signature [4,5]. A later precondition check [6] is also in the logs, where it's passing out due to a failure to find an Upgradeable condition on the ClusterVersion object [7]: $ grep -i precondition '2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log' I0119 22:19:03.806839 1 sync_worker.go:518] Skipping preconditions for a local operator image payload. I0119 22:24:15.636190 1 upgradeable.go:51] Precondition ClusterVersionUpgradeable passed: no Upgradeable condition on ClusterVersion. So timeline: * 22:23:59: "Loading payload" logged. * 22:23:59: store.go logs, as the CVO hunts for a signature. * 22:23:59: startedTime history entry with 'verified: false'. I expected us to log these history merges, but I guess they're V(5) [8] and the bump to --v=5 is new in 4.6 [9]. * 22:24:15: Precondition ClusterVersionUpgradeable passed * 22:24:15: "Payload loaded from" logged. Still digging... [1]: https://github.com/openshift/cluster-version-operator/pull/411 [2]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/cvo/sync_worker.go#L485-L507 [3]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/cvo/updatepayload.go#L91-L104 [4]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/verify/verify.go#L303-L306 [5]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/verify/verify.go#L331-L349 [6]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/cvo/sync_worker.go#L528 [7]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/payload/precondition/clusterversion/upgradeable.go#L51-L52 [8]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/cvo/status.go#L87 [9]: https://github.com/openshift/cluster-version-operator/pull/448
Aha [1]: $ grep 'unable to perform final sync' '2021-01-19T22:22:42+00:00-cluster-version-operator-59bc479b54-h54zb.log' E0119 22:24:15.924216 1 cvo.go:344] unable to perform final sync: context canceled So we're probably performing the status/history merge correctly, but then exiting before we successfully flush the state into the in-cluster object. Bug would have been introduced to 4.5.z with [2], bug 1872906, which went out with 4.5.15 [3]. Solution is to give that flush a bit of extra time, similar to bug 1906100. [1]: https://github.com/openshift/cluster-version-operator/blob/0a34ac3632dd869e9975fc6e9ebdcb95ef9374b8/pkg/cvo/cvo.go#L344 [2]: https://github.com/openshift/cluster-version-operator/pull/446 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1872906#c10
Thanks for pushing this, Neil :). I've pushed up a PR for master/4.7 which has what I expect to be a fix. As that is verified for 4.7, we'll backport to 4.6, and then to 4.5. Filling out a formal impact statement: Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? * 4.5.15's bug 1872906 and 4.6.0's bug 1843505 broke the outgoing ClusterVersion status sync. Clusters updating out of those releases, regardless of which release they are updating to, will be impacted by this bug. To avoid the bug, we could theoretically block all edged into impacted releases, but that's an awful lot of releases, and as discussed below, the impact isn't particularly terrible. What is the impact? Is it serious enough to warrant blocking edges? * Late-breaking changes to ClusterVersion status may not be pushed into the cluster. Because it takes some time to pull down and verify the release image, and because the incoming CVO knows what version it's been asked to run, the version name and release image are unlikely to be corrupted, but 'verified' might be reported as 'false' when in reality the incoming release was successfully verified (the outgoing CVO just exited without attempting to sync that 'verified: true' out to the cluster. This corrupted data is unfortunate, but has no known in-cluster consumers, and the main "was the target signed?" condition can be confirmed later by manually looking up the signature for the target release image. That is probably limited enough to not be worth blocking edges. How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? * Updating out to fixed releases will avoid the problem for future updates. There's no repairing updates out of impacted releases short of manually forcing 'verified' values, and that's probably not something we want to recommend. Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? * Yes, from 4.5.14 (and earlier) into 4.5.15, 4.6, and later.
Updating the reproduced status from QE: Cluster upgrades from 4.5.21 -> 4.6.8 -> 4.6.12 -> 4.7.0-fc.3. The issue is reproduced when upgrading from 4.6.8 to 4.6.12. history: - completionTime: null image: quay.io/openshift-release-dev/ocp-release@sha256:ce05fd235f0e59ea0114c73ef25abf838b9c3af103d9cb66aa7ba10eeacd8537 startedTime: "2021-01-22T07:48:23Z" state: Partial verified: false version: 4.7.0-fc.3 - completionTime: "2021-01-22T07:37:38Z" image: quay.io/openshift-release-dev/ocp-release@sha256:5c3618ab914eb66267b7c552a9b51c3018c3a8f8acf08ce1ff7ae4bfdd3a82bd startedTime: "2021-01-22T06:40:58Z" state: Completed verified: false version: 4.6.12 -- issue reproduced - completionTime: "2021-01-22T02:52:13Z" image: quay.io/openshift-release-dev/ocp-release@sha256:6ddbf56b7f9776c0498f23a54b65a06b3b846c1012200c5609c4bb716b6bdcdf startedTime: "2021-01-22T01:47:41Z" state: Completed verified: true -- different from comment #7 version: 4.6.8 - completionTime: "2021-01-21T11:37:56Z" image: quay.io/openshift-release-dev/ocp-release@sha256:7684a3daa92827b313a682b3615f2a348a941f8557b5b509b736226f40e01e07 startedTime: "2021-01-21T11:02:11Z" state: Completed verified: false -- expected based on comment#1 version: 4.5.21 Cluster upgrades from 4.5.14 -> 4.5.15 -> 4.5.24. The issue is reproduced when upgrading from 4.5.15. history: - completionTime: null image: quay.io/openshift-release-dev/ocp-release@sha256:f3ce0aeebb116bbc7d8982cc347ffc68151c92598dfb0cc45aaf3ce03bb09d11 startedTime: "2021-01-22T08:02:38Z" state: Partial verified: false version: 4.5.24 - completionTime: "2021-01-22T06:42:53Z" image: quay.io/openshift-release-dev/ocp-release@sha256:1df294ebe5b84f0eeceaa85b2162862c390143f5e84cda5acc22cc4529273c4c startedTime: "2021-01-22T06:08:49Z" state: Completed verified: true version: 4.5.15 - completionTime: "2021-01-22T05:29:55Z" image: quay.io/openshift-release-dev/ocp-release@sha256:95cfe9273aecb9a0070176210477491c347f8e69e41759063642edf8bb8aceb6 startedTime: "2021-01-22T05:02:22Z" state: Completed verified: false version: 4.5.14
Reproduced it when upgrading from 4.7.0-fc.1 to 4.7.0-fc.2 history: - completionTime: null image: quay.io/openshift-release-dev/ocp-release@sha256:2f00e3016ca5678e51e9d79d4d3ac5a2926e0c09a8e75df19ea983b9cd6c5d05 startedTime: "2021-01-25T05:56:51Z" state: Partial verified: false version: 4.7.0-fc.2 - completionTime: "2021-01-25T03:28:13Z" image: quay.io/openshift-release-dev/ocp-release@sha256:bf3a3f8bb1690ba1fa33e0a75437f377dde00a3dd19fe0398573f6bed48cfe04 startedTime: "2021-01-25T02:57:19Z" state: Completed verified: false version: 4.7.0-fc.1 observedGeneration: 3
I'm trying to verify it by upgrading from 4.7.0-0.nightly-2021-02-05-161444 to 4.7.0-0.nightly-2021-02-05-221250. Install a cluster using 4.7.0-0.nightly-2021-02-05-161444 and then update the cincinnati server # oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"}}' --type=merge # oc adm upgrade Cluster version is 4.7.0-0.nightly-2021-02-05-161444 Updates: VERSION IMAGE 4.7.0-0.nightly-2021-02-05-221250 registry.ci.openshift.org/ocp /release@sha256:4ae3e3c8aed290eae504af397c0fe4f141c7d30ed7276da0cedf9d271d1f8de4 Upgrading... # oc adm upgrade --to=4.7.0-0.nightly-2021-02-05-221250 Checking clusterversion status... conditions: - lastTransitionTime: "2021-02-07T06:17:02Z" message: 'The update cannot be verified: unable to locate a valid signature for one or more sources' reason: ImageVerificationFailed status: "True" type: Failing - lastTransitionTime: "2021-02-07T06:13:21Z" message: 'Unable to apply 4.7.0-0.nightly-2021-02-05-221250: the image may not be safe to use' reason: ImageVerificationFailed status: "True" type: Progressing - lastTransitionTime: "2021-02-07T05:55:10Z" status: "True" type: RetrievedUpdates history: - completionTime: null image: registry.ci.openshift.org/ocp/release@sha256:4ae3e3c8aed290eae504af397c0fe4f141c7d30ed7276da0cedf9d271d1f8de4 startedTime: "2021-02-07T06:13:21Z" state: Partial verified: false version: 4.7.0-0.nightly-2021-02-05-221250 It looks like the target nightly build cannot be verified. Is there any way to get it worked, or do I have to wait for the next fc build?
CI-registry nightlies like the one you used are apparently not signed: $ curl -sI https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=4ae3e3c8aed290eae50 4af397c0fe4f141c7d30ed7276da0cedf9d271d1f8de4/signature-1 | head -n1 HTTP/1.1 404 Not Found But nightlies which are promoted to quay.io and indexed on the mirrors [1] are signed: $ curl -s https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp-dev-preview/4.7.0-0.nightly-2021-02-04-031352/release.txt | grep Pull Pull From: quay.io/openshift-release-dev/ocp-release-nightly@sha256:f024d23bac16c1243025048bfaa27c064be7cba74717d1deafbd60f4c1bc01c4 $ curl -s https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=f024d23bac16c1243025048bfaa27c064be7cba74717d1deafbd60f4c1bc01c4/signature-1 | gpg --decrypt {"critical": {"image": {"docker-manifest-digest": "sha256:f024d23bac16c1243025048bfaa27c064be7cba74717d1deafbd60f4c1bc01c4"}, "type": "atomic container signature", "identity": {"docker-reference": "quay.io/openshift-release-dev/ocp-release-nightly:4.7.0-0.nightly-2021-02-04-031352-x86_64"}}, "optional": {"creator": "Red Hat OpenShift Signing Authority 0.0.1"}}gpg: Signature made Wed Feb 3 23:35:08 2021 PST gpg: using RSA key 938A80CAF21541EB gpg: Can't check signature: No public key 0x938A80CAF21541EB is the beta-2 key [2]. [1]: https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp-dev-preview/ [2]: https://access.redhat.com/security/team/key
Trevor, thanks a lot. The latest 4.7 build in mirror server does not have the fix. Then I'll get back to it when the next 4.7 builds are available.
Verified with upgrading from 4.7.0-0.nightly-2021-02-06-022227 to 4.7.0-rc.0 Steps to verify it: 1) Install a cluster with 4.7.0-0.nightly-2021-02-06-022227 2) Create a dummy cincy server with 4.7.0-0.nightly-2021-02-06-022227 and 4.7.0-rc.0 3) Upgrade the cluster 4) Check the verify state is true # oc get clusterversion -oyaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: creationTimestamp: "2021-02-10T06:56:41Z" generation: 3 managedFields: - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:clusterID: {} manager: cluster-bootstrap operation: Update time: "2021-02-10T06:56:41Z" - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:channel: {} f:upstream: {} manager: kubectl-edit operation: Update time: "2021-02-10T07:53:52Z" - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:desiredUpdate: .: {} f:force: {} f:image: {} f:version: {} manager: oc operation: Update time: "2021-02-10T08:02:12Z" - apiVersion: config.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:availableUpdates: {} f:conditions: {} f:desired: .: {} f:channels: {} f:image: {} f:url: {} f:version: {} f:history: {} f:observedGeneration: {} f:versionHash: {} manager: cluster-version-operator operation: Update time: "2021-02-10T08:02:35Z" name: version resourceVersion: "44932" selfLink: /apis/config.openshift.io/v1/clusterversions/version uid: a03003cc-1cd4-44e4-bcca-940158ecff9c spec: channel: nightly-4.7 clusterID: 2d39a2e8-56b6-4cf1-8f04-a4ffc9262e83 desiredUpdate: force: false image: quay.io/openshift-release-dev/ocp-release@sha256:497fa748b89619aba312a926a0be0ad155c4b894ca3e2824a00167421e3441b0 version: 4.7.0-rc.0 upstream: https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy4.json status: availableUpdates: null conditions: - lastTransitionTime: "2021-02-10T07:24:27Z" message: Done applying 4.7.0-0.nightly-2021-02-06-022227 status: "True" type: Available - lastTransitionTime: "2021-02-10T07:24:27Z" status: "False" type: Failing - lastTransitionTime: "2021-02-10T08:02:20Z" message: 'Working towards 4.7.0-rc.0: 70 of 668 done (10% complete)' status: "True" type: Progressing - lastTransitionTime: "2021-02-10T08:00:08Z" status: "True" type: RetrievedUpdates desired: channels: - nightly-4.7 image: quay.io/openshift-release-dev/ocp-release@sha256:497fa748b89619aba312a926a0be0ad155c4b894ca3e2824a00167421e3441b0 url: https://access.redhat.com/errata/RHBA-2020:5633 version: 4.7.0-rc.0 history: - completionTime: null image: quay.io/openshift-release-dev/ocp-release@sha256:497fa748b89619aba312a926a0be0ad155c4b894ca3e2824a00167421e3441b0 startedTime: "2021-02-10T08:02:20Z" state: Partial verified: true <----------- It's as expected version: 4.7.0-rc.0 - completionTime: "2021-02-10T07:24:27Z" image: registry.ci.openshift.org/ocp/release@sha256:ee7e4d58dd6e001d4f621e5cbf893d203910c2d3c2dce88b8ac65538eb1f1411 startedTime: "2021-02-10T06:56:41Z" state: Completed verified: false version: 4.7.0-0.nightly-2021-02-06-022227 observedGeneration: 3 versionHash: Q_5mmldRekY= kind: List metadata: resourceVersion: "" selfLink: "" The verified state is changed to true. Moving it to verified.
Thanks guys for taking a long look at it. It was definitely troubling. I'm exited to see the fix!
For reference in case you need to manually verify the release's signature, the following steps can be performed: ## Steps to verify cluster version release signature 1. Get SHA for running version `oc get clusterversion -o jsonpath='{.items[].status.history[0].image}'` 2. Copy the SHA value which is given after `sha256:` to use in step 3. 3. Get and verify the signature used to sign the release package that your cluster shows it is running. 1. On a RHEL 7 or 8 host, import the Red Hat public keys into your gpg keyring. `gpg --import /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-beta` `gpg --import /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release` 2. Get and verify the signature `curl -s https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=[SHA_VALUE]/signature-1 | gpg --decrypt` 4. Confirm that the SHA in the `.critical.image["docker-manifest-digest"]` value matches the SHA for your running version from step 1. 5. Confirm that the RSA key used to decrypt the signature from step 3 is one of the product signing keys listed on https://access.redhat.com/security/team/key. 6. You can also verify that the Primary key fingerprint matches the corresponding entry on https://access.redhat.com/security/team/key. ## Example 1. Get SHA for running version ```sh $ oc get clusterversion -o jsonpath='{.items[].status.history[0].image}' quay.io/openshift-release-dev/ocp-release@sha256:c8746322036ee50c9b015404f6bcae3003f537b128641bd29ad9ab4048c7b274 ``` 2. Copy `c8746322036ee50c9b015404f6bcae3003f537b128641bd29ad9ab4048c7b274` 3. Get and verify the signature used to sign the release package that your cluster shows it is running. ```sh $ curl -s https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=c8746322036ee50c9b015404f6bcae3003f537b128641bd29ad9ab4048c7b274/signature-1 | gpg --decrypt {"critical": {"image": {"docker-manifest-digest": "sha256:c8746322036ee50c9b015404f6bcae3003f537b128641bd29ad9ab4048c7b274"}, "type": "atomic container signature", "identity": {"docker-reference": "quay.io/openshift-release-dev/ocp-release-nightly:4.7.0-fc.5-x86_64"}}, "optional": {"creator": "Red Hat OpenShift Signing Authority 0.0.1"}}gpg: Signature made Mon 01 Feb 2021 02:20:39 AM EST gpg: using RSA key 938A80CAF21541EB gpg: Good signature from "Red Hat, Inc. (beta key 2) <security>" [unknown] gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: B08B 659E E86A F623 BC90 E8DB 938A 80CA F215 41EB ``` 4. Confirm that the SHA from your cluster matches the SHA in the encrypted signature. ``` {"critical": {"image": {"docker-manifest-digest": "sha256: **c8746322036ee50c9b015404f6bcae3003f537b128641bd29ad9ab4048c7b274** "} == quay.io/openshift-release-dev/ocp-release@sha256: **c8746322036ee50c9b015404f6bcae3003f537b128641bd29ad9ab4048c7b274** ```` 5. Confirm the RSA key used to decrypt the signature is a Red Hat product signing key. On page https://access.redhat.com/security/team/key, RSA key `938A80CAF21541EB` is listed as `4096R/938a80caf21541eb (2009-02-24): Red Hat, Inc. (beta key 2) <security>`
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633