Spun off from bug 2018356, where a cluster went from a completed 4.7.34, partway to 4.8.17, and then was retargeted to 4.8.18. The kube-apiserver made it across to 4.8.17 during the first leg and set Upgradeable=False. The cluster-version operator noticed the 4.8.18 retarget and running its prechecks, used the most-recently-compeleted 4.7.34 as the base for its "is this a minor update?" checks [1]. And yes, 4.8.18 is a minor bump above 4.7.34. A smarter CVO would have a status.versions check over in [2] that said something like: kube-apiserver's status.versions[name=operator] has leveled on 4.8.17. That's a minor below my most recent completed version is 4.7.34, so I really don't care what kube-apiserver has to say about Upgradeable. It can't stop me from aiming for a new 4.8.z, and I'm not going to include its Upgradeable in ClusterVersion's Upgradeable. That's still a bit off, because the one status.versions name we document is 'operator', and we require that to level after the whole component has rolled out the new version [3]. So operators like kube-apiserver can't just say "I'm a 4.8 operator, and I don't like how the cluster looks vs. what I know is coming in 4.9". They have to say "I'm a 4.8 operator, and my status.versions[name=operator] is also 4.8.z, so I can have opinions about whether my component can go to 4.9". If they have opinions with an older minor in status.versions[name=operator], the CVO will misattribute the Upgradeable option to the older minor, and might block patch-bumping retargets like we saw in bug 2018356. Setting to medium because, while blocking a patch-level retarget is bad, not too many folks are doing patch-level retargets from the middle of a minor-bumping update. And also in the retarget + Upgradeable space is bug 1802553, but that's about the CVO injecting a new Upgradeable condition on its own, while this bug is about aggregating ClusterOperator Upgradeable conditions, so I don't actually think there will be any code overlap between these two bugs. [1]: https://github.com/openshift/cluster-version-operator/blob/c20e4d8a6cd8fe7f9cee5e05dc232f11c5b09ca8/pkg/payload/precondition/clusterversion/upgradeable.go#L114-L120 [2]: https://github.com/openshift/cluster-version-operator/blob/c20e4d8a6cd8fe7f9cee5e05dc232f11c5b09ca8/pkg/cvo/upgradeable.go#L179-L180 [3]: https://github.com/openshift/api/blob/c4970133b5ba3147da37ffd7cd8a18dd9bf052e0/config/v1/types_cluster_operator.go#L48
Poking around a bit more: $ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version 4.7.35 $ oc adm upgrade channel candidate-4.8 # requires oc from 4.9+ $ cat <<EOF >co.yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: name: testing spec: {} EOF $ oc apply -f co.yaml $ oc proxy & $ curl -k -XPATCH -H "Accept: application/json" -H "Content-Type: application/json-patch+json" 'http://127.0.0.1:8001/apis/config.openshift.io/v1/clusteroperators/testing/status' -d '[{"op": "add", "path": "/status", "value": {"conditions": [{"lastTransitionTime": "2021-06-01T01:01:01Z", "type": "Upgradeable", "status": "False", "reason": "Testing", "message": "The whatsits are broken."}]}}]' $ killall oc $ wait $ oc wait --for=condition=Upgradeable=False clusterversion/version $ oc adm upgrade --to 4.8.18 $ oc get -o json clusterversion version | jq '.status | {desired, history}' { "desired": { "channels": [ "candidate-4.8", "candidate-4.9" ], "image": "quay.io/openshift-release-dev/ocp-release@sha256:321aae3d3748c589bc2011062cee9fd14e106f258807dc2d84ced3f7461160ea", "url": "https://access.redhat.com/errata/RHBA-2021:4020", "version": "4.8.18" }, "history": [ { "completionTime": null, "image": "quay.io/openshift-release-dev/ocp-release@sha256:321aae3d3748c589bc2011062cee9fd14e106f258807dc2d84ced3f7461160ea", "startedTime": "2021-10-29T18:02:53Z", "state": "Partial", "verified": true, "version": "4.8.18" }, { "completionTime": "2021-10-29T17:09:20Z", "image": "registry.build01.ci.openshift.org/ci-ln-1si4qlk/release@sha256:fc8ceaa410c3903f249a071cb3bf4a5bc1523fc16d7cdaf0e0c3384bf08ec622", "startedTime": "2021-10-29T16:45:08Z", "state": "Completed", "verified": false, "version": "4.7.35" } ] } Instead of adjusting our ClusterOperator consumption, we could adjust our ClusterVersion handling to only set desired after we'd accepted the update (instead of setting it while we were still considering preconditions). Then we could use that desired value as a sign of our jumping-off place in our Upgradeable precondition, instead of the most-recently-completed version. For example, in the 4.7.34 (completed) -> 4.8.17 (partial) -> 4.8.18 case, it would treat 4.8.17 as the jumping off point for the 4.8.18 retarget. That may or may not match the versions of the feeding ClusterOperator, but really, the point would be "this is accepting a patch retarget of an already-accepted 4.y minor". And we want bug 1802553 to protect us from retargets adding an additional minor-bump on top of an existing partial minor-bump. And that avoids any need for component operators to get the fiddly status.versions[name=operator] guards I'd floated in comment 0 right.
Reducing the severity of the bug as this bug has been around for some time and we have not not heard many complains about the issue.
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9013