Currently we had a customer that triggered the upgrade from 4.1.27 to 4.3, having intermediate versions on 4.2 in partial state. We have asked for details of the CVO from the customer to understand better the procedure taken, but we might need to implment a way to either stop the upgrade in case customer makes a mistake or block the upgrade if the customer changes the channel on the console to a version which the upgrade does not support, like in this case
From the version object "history": [ { "state": "Partial", "startedTime": "2020-02-13T08:15:27Z", "completionTime": null, "version": "4.3.0", "image": "quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d", "verified": true }, { "state": "Partial", "startedTime": "2020-01-29T16:03:01Z", "completionTime": "2020-02-13T08:15:27Z", "version": "4.2.16", "image": "quay.io/openshift-release-dev/ocp-release@sha256:e5a6e348721c38a78d9299284fbb5c60fb340135a86b674b038500bf190ad514", "verified": true }, { "state": "Partial", "startedTime": "2020-01-13T13:05:10Z", "completionTime": "2020-01-29T16:03:01Z", "version": "4.2.13", "image": "quay.io/openshift-release-dev/ocp-release@sha256:782b41750f3284f3c8ee2c1f8cb896896da074e362cf8a472846356d1617752d", "verified": true }, { "state": "Partial", "startedTime": "2019-12-11T12:38:42Z", "completionTime": "2020-01-13T13:05:10Z", "version": "4.2.10", "image": "quay.io/openshift-release-dev/ocp-release@sha256:dc2e38fb00085d6b7f722475f8b7b758a0cb3a02ba42d9acf8a8298a6d510d9c", "verified": true },
users are allowed to `FORCE` updates and CVO is expected to move forward because otherwise users can get stuck. oc adm upgrade prevents upgrades when there's already one on flight, but again users not force it.
Please get back to us with exactly the previous upgrade actions that were taken on this cluster. At first glance this appears that they have gone against our recommendations and applied updates not found in the graph.
> oc adm upgrade prevents upgrades when there's already one on flight... Linking the source for this [1], in case other folks are wondering where it is ;). [1]: https://github.com/openshift/oc/blob/5d7a12f03389b03b651f963cb5ee8ddfa9cff559/pkg/cli/admin/upgrade/upgrade.go#L295-L300
I don't actually see a CVO-side precondition for this. I'd expect one in [1], but the only ClusterVersion precondition we have now is around the Upgradeable [2]. Do we need to grow a CVO-side precondition for "and we're not currently Progressing"? Do we have one that I'm just missing? Is there some reason why we want to guard against this client-side but not guard against it in the CVO? [1]: https://github.com/openshift/cluster-version-operator/tree/2afd105d0291006f940022b048e927ab3778ebf6/pkg/payload/precondition/clusterversion [2]: https://github.com/openshift/cluster-version-operator/blob/2afd105d0291006f940022b048e927ab3778ebf6/pkg/payload/precondition/clusterversion/upgradeable.go#L30-L33
+1 to a CVO-side precondition for not upgrading when the current status is "progressing". That's the fix we should do as part of this bug.
Moving to high. Allowing folks to retarget to a 4.3 release when they are only partially through a 4.1 -> 4.2 update is really risky.
Any precondition check has to be overridable by the user, so as long as the precondition allows override we're ok doing this.
As per the discussion on the PR, this does not seem to be something as important as we thought initially, hence moving this to 4.6.0.
Reducing the severity of bug as we did not see this issue getting reproduced much.
Not critical for 4.6 , hence moving to 4.7.
The right way to do this without breaking the API would be to add upgradable=false if the upgrade is not supported. Going to do this for this bug and see if folks likes this approach or not.
In this case we will only set upgradable=false for y stream upgrades(i.e. between minor versions)
Only setting Upgradeable=False during minor bumps would guard against extreme 4.(y-1) with 4.(y+1) version skew. But I'd also be ok setting Upgradeable=False during all updates. It would be easier to code that way, and minor version bumps are exciting enough that I'm ok forcing folks to consolidate their cluster on a well-defined jumping-off point before attempting them.
Lala is working on this, but no PR yet.
PR is up, master is open for 4.7. Just needs review.
Clayton is not convinced [1]. We'll keep hunting for a consensus fix next sprint. [1]: https://github.com/openshift/cluster-version-operator/pull/460#pullrequestreview-503280048
Try to reproduce it with following steps(unexpected): 1. install ocp v4.5.20 2. patch upstream and channel in cv for the 1st upgrade # ./oc get clusterversion -o json|jq .items[].spec { "channel": "stable-4.6", "clusterID": "a2fcbdf7-a8a4-4685-b0c8-2dc328203478", "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph" } 3. upgrade the cluster from v4.5.20 to v4.6.5 # ./oc adm upgrade --to 4.6.5 Updating to 4.6.5 # ./oc adm upgrade info: An upgrade is in progress. Working towards 4.6.5: 11% complete 4. patch channel in cv for the 2nd upgrade while 1st upgrade is ongoing. # ./oc get clusterversion -o json|jq .items[].spec { "channel": "stable-4.7", "clusterID": "a2fcbdf7-a8a4-4685-b0c8-2dc328203478", "desiredUpdate": { "force": false, "image": "registry.svc.ci.openshift.org/ocp/release@sha256:b8154e802c17dae57d1cfb0580e6a79544712cea0f77e01ae6171854f75975ea", "version": "4.6.5" }, "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph" } 5. try to upgrade the cluster to v4.7 through cli without --force, failed(expected) # ./oc adm upgrade --to 4.7.0-0.nightly-2020-11-26-042221 error: already upgrading. Reason: Message: Working towards 4.6.5: 11% complete If you want to upgrade anyway, use --allow-upgrade-with-warnings. # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.20 True True 110s Working towards 4.6.5: 11% complete 6. try to upgrade the cluster to v4.7 through web-console, succeed(unexpected and reproduced) # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.20 True True 13m Working towards 4.7.0-0.nightly-2020-11-25-114114: 15% complete # ./oc get clusterversion -o json|jq .items[].status.history [ { "completionTime": null, "image": "registry.svc.ci.openshift.org/ocp/release@sha256:bf37e13af0e254d0b744b62ace0dcf5560230374d7877a8fde16cf9134ec7862", "startedTime": "2020-11-26T09:22:49Z", "state": "Partial", "verified": false, "version": "4.7.0-0.nightly-2020-11-25-114114" }, { "completionTime": "2020-11-26T09:22:49Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:b8154e802c17dae57d1cfb0580e6a79544712cea0f77e01ae6171854f75975ea", "startedTime": "2020-11-26T09:19:00Z", "state": "Partial", "verified": false, "version": "4.6.5" }, { "completionTime": "2020-11-26T09:02:15Z", "image": "quay.io/openshift-release-dev/ocp-release@sha256:78b878986d2d0af6037d637aa63e7b6f80fc8f17d0f0d5b077ac6aca83f792a0", "startedTime": "2020-11-26T08:24:11Z", "state": "Completed", "verified": false, "version": "4.5.20" } ]
(In reply to liujia from comment #20) > 6. try to upgrade the cluster to v4.7 through web-console, > succeed(unexpected and reproduced) Right. I don't think we want to rely on client-side guards (like oc has today) for this. I'd rather have the CVO itself say "sorry, I'm in the middle of 4.y->4.(y+1), so I'm not going to pick up your requested 4.(y+2) target". We could just hold it in ClusterVersion.spec while finishing out the 4.(y+1) target and then pick it up. And folks could force if they wanted to waive the CVO-side guard. But I would like a CVO-side guard of some sort to close out this bug.
I am planning to close https://bugzilla.redhat.com/show_bug.cgi?id=1802553 as this does not seem to an issue impacting clusters and putting a guard for Y+2 does not seem critical to me at this point of time. Also we did not reach any agreement with Clayton around how we should fix this.
@Trevor pointed me to https://github.com/openshift/enhancements/blob/master/enhancements/update/eus-upgrades-mvp.md#ota---inhibit-minor-version-upgrades-when-an-upgrade-is-in-progress, so re-opening this bug.
*** Bug 1947566 has been marked as a duplicate of this bug. ***
*** Bug 2069480 has been marked as a duplicate of this bug. ***
*** Bug 2083988 has been marked as a duplicate of this bug. ***
Moving the bug as an enhancement request[1] [1] https://issues.redhat.com/browse/OTA-861