2018368 – Cluster-version operator should account for reporting ClusterOperator version in Upgradeable comparison

Bug 2018368 - Cluster-version operator should account for reporting ClusterOperator version in Upgradeable comparison

Summary: Cluster-version operator should account for reporting ClusterOperator version...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Lalatendu Mohanty
QA Contact:	Yang Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-29 03:17 UTC by W. Trevor King
Modified:	2023-03-09 01:08 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-03-09 01:08:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description W. Trevor King 2021-10-29 03:17:35 UTC

Spun off from bug 2018356, where a cluster went from a completed 4.7.34, partway to 4.8.17, and then was retargeted to 4.8.18. The kube-apiserver made it across to 4.8.17 during the first leg and set Upgradeable=False. The cluster-version operator noticed the 4.8.18 retarget and running its prechecks, used the most-recently-compeleted 4.7.34 as the base for its "is this a minor update?" checks [1]. And yes, 4.8.18 is a minor bump above 4.7.34.

A smarter CVO would have a status.versions check over in [2] that said something like:

kube-apiserver's status.versions[name=operator] has leveled on 4.8.17. That's a minor below my most recent completed version is 4.7.34, so I really don't care what kube-apiserver has to say about Upgradeable. It can't stop me from aiming for a new 4.8.z, and I'm not going to include its Upgradeable in ClusterVersion's Upgradeable.

That's still a bit off, because the one status.versions name we document is 'operator', and we require that to level after the whole component has rolled out the new version [3]. So operators like kube-apiserver can't just say "I'm a 4.8 operator, and I don't like how the cluster looks vs. what I know is coming in 4.9". They have to say "I'm a 4.8 operator, and my status.versions[name=operator] is also 4.8.z, so I can have opinions about whether my component can go to 4.9". If they have opinions with an older minor in status.versions[name=operator], the CVO will misattribute the Upgradeable option to the older minor, and might block patch-bumping retargets like we saw in bug 2018356.

Setting to medium because, while blocking a patch-level retarget is bad, not too many folks are doing patch-level retargets from the middle of a minor-bumping update.

And also in the retarget + Upgradeable space is bug 1802553, but that's about the CVO injecting a new Upgradeable condition on its own, while this bug is about aggregating ClusterOperator Upgradeable conditions, so I don't actually think there will be any code overlap between these two bugs.

[1]: https://github.com/openshift/cluster-version-operator/blob/c20e4d8a6cd8fe7f9cee5e05dc232f11c5b09ca8/pkg/payload/precondition/clusterversion/upgradeable.go#L114-L120
[2]: https://github.com/openshift/cluster-version-operator/blob/c20e4d8a6cd8fe7f9cee5e05dc232f11c5b09ca8/pkg/cvo/upgradeable.go#L179-L180
[3]: https://github.com/openshift/api/blob/c4970133b5ba3147da37ffd7cd8a18dd9bf052e0/config/v1/types_cluster_operator.go#L48

Comment 1 W. Trevor King 2021-10-29 18:33:56 UTC

Poking around a bit more:

$ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version
4.7.35
$ oc adm upgrade channel candidate-4.8  # requires oc from 4.9+
$ cat <<EOF >co.yaml 
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  name: testing
spec: {}
EOF
$ oc apply -f co.yaml
$ oc proxy &
$ curl -k -XPATCH -H "Accept: application/json" -H "Content-Type: application/json-patch+json" 'http://127.0.0.1:8001/apis/config.openshift.io/v1/clusteroperators/testing/status' -d '[{"op": "add", "path": "/status", "value": {"conditions": [{"lastTransitionTime": "2021-06-01T01:01:01Z", "type": "Upgradeable", "status": "False", "reason": "Testing", "message": "The whatsits are broken."}]}}]'
$ killall oc
$ wait
$ oc wait --for=condition=Upgradeable=False clusterversion/version
$ oc adm upgrade --to 4.8.18
$ oc get -o json clusterversion version | jq '.status | {desired, history}'
{
  "desired": {
    "channels": [
      "candidate-4.8",
      "candidate-4.9"
    ],
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:321aae3d3748c589bc2011062cee9fd14e106f258807dc2d84ced3f7461160ea",
    "url": "https://access.redhat.com/errata/RHBA-2021:4020",
    "version": "4.8.18"
  },
  "history": [
    {
      "completionTime": null,
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:321aae3d3748c589bc2011062cee9fd14e106f258807dc2d84ced3f7461160ea",
      "startedTime": "2021-10-29T18:02:53Z",
      "state": "Partial",
      "verified": true,
      "version": "4.8.18"
    },
    {
      "completionTime": "2021-10-29T17:09:20Z",
      "image": "registry.build01.ci.openshift.org/ci-ln-1si4qlk/release@sha256:fc8ceaa410c3903f249a071cb3bf4a5bc1523fc16d7cdaf0e0c3384bf08ec622",
      "startedTime": "2021-10-29T16:45:08Z",
      "state": "Completed",
      "verified": false,
      "version": "4.7.35"
    }
  ]
}

Instead of adjusting our ClusterOperator consumption, we could adjust our ClusterVersion handling to only set desired after we'd accepted the update (instead of setting it while we were still considering preconditions).  Then we could use that desired value as a sign of our jumping-off place in our Upgradeable precondition, instead of the most-recently-completed version.  For example, in the 4.7.34 (completed) -> 4.8.17 (partial) -> 4.8.18 case, it would treat 4.8.17 as the jumping off point for the 4.8.18 retarget.  That may or may not match the versions of the feeding ClusterOperator, but really, the point would be "this is accepting a patch retarget of an already-accepted 4.y minor".  And we want bug 1802553 to protect us from retargets adding an additional minor-bump on top of an existing partial minor-bump.  And that avoids any need for component operators to get the fiddly status.versions[name=operator] guards I'd floated in comment 0 right.

Comment 5 Lalatendu Mohanty 2023-01-11 18:34:58 UTC

Reducing the severity of the bug as this bug has been around for some time and we have not not heard many complains about the issue.

Comment 6 Shiftzilla 2023-03-09 01:08:30 UTC

OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9013

Note You need to log in before you can comment on or make changes to this bug.