On 4.3.10 -> 4.3.13, starting a verified update (with force false) and then updating to set force true will cause the cluster-version operator to bypass any blocking preconditions (appropriate) but not update the history to set verified false (bug). Getting stuck on a precondition (like bug 1822752): $ oc get -o json clusterversion version | jq -r .status.desired.version 4.3.10 $ oc get -o json clusteroperators | jq -r '.items[] | .upgradeable = ([.status.conditions[] | select(.type == "Upgradeable")][0]) | select(.upgradeable.status == "False") | .upgradeable.lastTransitionTime + " " + .metadata.name + " " + .upgradeable.reason' | sort ...no output... $ oc patch scc privileged --type json -p '[{"op": "add", "path": "/users/-", "value": "kubeadmin"}]' $ oc get -o json clusteroperators | jq -r '.items[] | .upgradeable = ([.status.conditions[] | select(.type == "Upgradeable")][0]) | select(.upgradeable.status == "False") | .upgradeable.lastTransitionTime + " " + .metadata.name + " " + .upgradeable.reason' | sort 2020-04-17T21:16:13Z kube-apiserver DefaultSecurityContextConstraints_Mutated $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "candidate-4.3"}]' $ oc adm upgrade --to 4.3.13 Updating to 4.3.13 $ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + " " + .status + " " + .message' | sort 2020-04-17T20:54:15Z RetrievedUpdates True 2020-04-17T21:11:52Z Available True Done applying 4.3.10 2020-04-17T21:17:47Z Progressing True Unable to apply 4.3.13: it may not be safe to apply this update 2020-04-17T21:17:47Z Upgradeable False Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged] 2020-04-17T21:18:07Z Failing True Precondition "ClusterVersionUpgradeable" failed because of "DefaultSecurityContextConstraints_Mutated": Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged] Patching force to waive the failures: $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/desiredUpdate/force", "value": true}]' clusterversion.config.openshift.io/version patched Update moves forward: $ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + " " + .status + " " + .message' | sort 2020-04-17T20:54:15Z RetrievedUpdates True 2020-04-17T21:11:52Z Available True Done applying 4.3.10 2020-04-17T21:17:47Z Progressing True Unable to apply 4.3.13: the update could not be applied 2020-04-17T21:17:47Z Upgradeable False Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged] 2020-04-17T21:20:14Z Failing True Could not update deployment "openshift-cluster-version/cluster-version-operator" (5 of 498) $ oc get --watch clusterversion version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.10 True True 3m42s Working towards 4.3.13: 13% complete version 4.3.10 True True 4m42s Working towards 4.3.13: 16% complete version 4.3.10 True True 6m12s Working towards 4.3.13: 23% complete ... But history still shows verified true: $ oc get -o json clusterversion version | jq -r '.status.history[] | .startedTime + " " + .completionTime + " " + .version + " " + .state + " " + (.verified | tostring)' 2020-04-17T21:17:47Z 4.3.13 Partial true 2020-04-17T20:54:15Z 2020-04-17T21:11:52Z 4.3.10 Completed false We need to set verified=false if any precondition failures were waived.
Moving this to 4.6.0 as we have higher priority bugs for 4.5.
Also setting sev/prio to low as it is does not impact the functionality and relatively a corner case.
> We need to set verified=false if any precondition failures were waived. > https://github.com/openshift/api/blob/de5b010b2b386867fc61c4f718080755f9707748/config/v1/types_cluster_version.go#L180-L184 ``` // verified indicates whether the provided update was properly verified // before it was installed. If this is false the cluster may not be trusted. // +kubebuilder:validation:Required // +required Verified bool `json:"verified"` ``` Why would be set verified false if integrity checks i.e. correctly signed passes for a release image even though the preconditions like not-upradeable are true ??
I think the verified field is in place only for Signature checking and not if other aspect of viability of the upgrade have passed/failed.
The fact that there's ambiguity around whether non-signature checks are counted sounds like at least "we need to clarify the API godocs" to me. I would like to have a record of whether these preconditions are waived. As long as the trusted keys are RH-managed, we can already recover signed-ness in post-mortems from history[].Image digests, because we can look for a matching signature in our store. But it's harder to reconstruct the state of Upgradeable conditions or other preconditions that may have been waved. I'd be even happier if the API was a string summarizing overridden conditions (signatures, preconditions, etc.), but... schema growth is so much work :p. So: a. I'm in the minority, Abhinav's position that verified is just about signatures is the consensus, and this bug should be an API bug about clarifying the godocs. b. My initial understanding that all upgrade verification, including preconditions and eventually preflights, feeds verified is the consensus. This bug should be about clarifying API godocs and adjusting CVO behavior to match. c. Other options? Thoughts?
(In reply to W. Trevor King from comment #6) ... > a. I'm in the minority, Abhinav's position that verified is just about > signatures is the consensus, and this bug should be an API bug about > clarifying the godocs. > b. My initial understanding that all upgrade verification, including > preconditions and eventually preflights, feeds verified is the consensus. > This bug should be about clarifying API godocs and adjusting CVO behavior to > match. > c. Other options? > > Thoughts? What is Abinav's "position" and your "initial understanding" based on? I'm really asking what's the requirement and from where does it come? Asked differently, do we know why the Verified field was originally created?
Verified landed in the API via [1,2] to track forcing. At the time, the only check around to be forced was signature verification. Now we also have preconditions. In the future, we will have preflights. The API godocs which establish the semantics do not sufficiently distinguish between the "signatures only" and "all pre-update condition checks" interpretation, and Ahbinav is already here. I guess we could loop in Clayton too and see if he has an opinion? They were the only two involved in [2]. [1]: https://github.com/openshift/api/commit/ab4ff93d200a6c3e9d2fcdf0a161a6ada9dfd716 [2]: https://github.com/openshift/api/pull/293
Adding UpcomingSprint keyword. The bug needs additional work which will not be completed by the end of this Sprint.
Based on the provided history and the current api doc's use of the term "verified", I think we should use this bug to clarify the godocs (Trevor's option a). If we feel that use of --force should be extended beyond just signature verification (option b), work that through a RFE Jira to determine implementation, e.g. whether we add another flag or simply expand scope of --force, etc.
My options in comment 6 were just about the 'verified' semantics. We can have a whole separate discussion about the --force semantics. oc seems a bit inconsistent around it today, with [1] saying it can waive verification errors or precondition failures, [2] just talking about signature verification (and including an "If ..." non-sentence), and [3] also just talking about verification. [1]: https://github.com/openshift/oc/blob/c33851e73e8e2ed66e00da81fa30cee493ac76e2/pkg/cli/admin/upgrade/upgrade.go#L196 [2]: https://github.com/openshift/oc/blob/c33851e73e8e2ed66e00da81fa30cee493ac76e2/pkg/cli/admin/upgrade/upgrade.go#L57-L61 [3]: https://github.com/openshift/oc/blob/c33851e73e8e2ed66e00da81fa30cee493ac76e2/pkg/cli/admin/upgrade/upgrade.go#L83
(In reply to W. Trevor King from comment #11) > My options in comment 6 were just about the 'verified' semantics. Exactly and hence my first and main point/opinion that this bug should be an API bug about clarifying the godocs. Beyond that I'm just saying that further consideration of the semantics of 'verified' and/or the "--force" option seem more appropriate as a Jira.
> Exactly and hence my first and main point/opinion that this bug should be an API bug about clarifying the godocs. +1 Verified only used for image signature verification. > Beyond that I'm just saying that further consideration of the semantics of 'verified' and/or the "--force" option seem more appropriate as a Jira. I do not see the benefit mixing Verified with --force.
Adding UpcomingSprint keyword. Bug resolution still being discussed.
We discussed this a bit in the 2020-07-30 arch call. Didn't actually spend much time on 'verified', but 'force' got kicked around a bit. Erica raised concerns about force covering both inability to find a valid signature (which could be "restricted-network, and admins haven't bothered to copy the associate signature into a local ConfigMap", not so bad, or "admin accidentally pointing cluster at a malicious image pullspec", which is very bad) and failing preconditions (e.g. operator A claims Upgradeable=False for a minor bump, but admins are confident they aren't actually impacted). Abhinav pointed out that the goal was to keep the API and CVO implementation simple, and split intelligent checks and scoped overrides out into clients. Colin pointed out that expecting client-side validation via 'oc' might lead to surprises if/when folks write their own code to drive the less-specific ClusterVersion API. I pointed out, sort of a corollary to Colin's point, that if we rely on intelligent client-side verification, we'd need to re-implement it in all clients (e.g. both 'oc' and the web console), or accept that some clients (like the web-console today) will expose a less-rich interface than others (like 'oc' today). On the 'verified' front, we just summarized the existing discussion from the bug, mostly as given in comment 6. So, lots of discussion, but still no clear consensus around intended semantics for either 'force' or 'verified', as far as I can tell.
Still working on a consensus opinion for the intended semantics.
Comment 16 is still current.
Clearing the needinfo as I believe this has been answered already.
I've opened [1] to catch the API docs up with the current reality, which will hopefully unblock the "still no clear consensus" from comment 15. [1]: https://github.com/openshift/api/pull/885
api#885 just landed, including: Verified does not cover upgradeable checks that depend on the cluster state at the time when the update target was accepted. which brings those docs into line with the long-standing CVO implementation.