Bug 2018356 - Improve the kubelet version skew status condition message [NEEDINFO]
Summary: Improve the kubelet version skew status condition message
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Luis Sanchez
QA Contact: Ke Wang
URL:
Whiteboard: LifecycleStale
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-29 01:56 UTC by Lalatendu Mohanty
Modified: 2022-08-18 14:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-18 14:26:03 UTC
Target Upstream Version:
Embargoed:
mfojtik: needinfo?


Attachments (Terms of Use)

Description Lalatendu Mohanty 2021-10-29 01:56:12 UTC
Description of problem:

We set upgradeable=false and failing=true status conditions on cluster version operator if cluster crosses upgrade kubelet version skew policy. But the messages in the status conditions are not easy to understand and confusing. 

Here are the example of status conditions from CVO 

{
  "type": "Upgradeable",
  "status": "False",
  "lastTransitionTime": "2021-10-28T11:44:59Z",
  "reason": "KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade",
  "message": "Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor versions on 6 nodes will not be supported in the next OpenShift minor version upgrade."
}

{
  "type": "Failing",
  "status": "True",
  "lastTransitionTime": "2021-10-28T22:44:03Z",
  "reason": "UpgradePreconditionCheckFailed",
  "message": "Precondition \"ClusterVersionUpgradeable\" failed because of \"KubeletMinorVersion_KubeletMinorVersionUnsupportedNextUpgrade\": Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor versions on 6 nodes will not be supported in the next OpenShift minor version upgrade."
}

Version-Release number of selected component (if applicable):

4.8

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Upgrade history of the cluster

{
    "state": "Partial",
    "startedTime": "2021-10-28T12:03:03Z",
    "completionTime": null,
    "version": "4.8.18",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:321aae3d3748c589bc2011062cee9fd14e106f258807dc2d84ced3f7461160ea",
    "verified": true
  },
  {
    "state": "Partial",
    "startedTime": "2021-10-28T11:36:15Z",
    "completionTime": "2021-10-28T12:03:03Z",
    "version": "4.8.17",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:1935b6c8277e351550bd7bfcc4d5df7c4ba0f7a90165c022e2ffbe789b15574a",
    "verified": true
  },
  {
    "state": "Completed",
    "startedTime": "2021-10-28T07:26:36Z",
    "completionTime": "2021-10-28T08:50:32Z",
    "version": "4.7.34",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:e121118406de30f9a92168808979d9363c1a576401c399bf6f528fb47c48b16c",
    "verified": true
  },

Comment 1 W. Trevor King 2021-10-29 02:58:47 UTC
Backing kube-apiserver code for this is from [1].  I'm not actually to worried about that part of the message, although it's possible we could improve a bit by adjusting the multiple-skewed-nodes rollups like [2] to pass along the current versions if there weren't too many of them.  For this cluster, something like:

  Kubelet minor version (v1.20.0+bbbc079) on 6 nodes will not be supported in the next OpenShift minor version upgrade.

The two confounding issues are:

* The CVO should use "should not be upgraded to 4.9" instead of "should not be upgraded between minor versions" [3].
* The CVO is getting confused by a mid-update retarget.  The kube-apiserver made it to 4.8.17 during the first (partial) update, but the retarget to 4.8.18 happened before the machine-config operator had been asked to leave 4.7.  So the 4.8 kube-apiserver is trying to keep the cluster from going to 4.9 and getting a 1.22 kube-apiserver that doesn't like the 1.20 nodes (because of Scott's odd/even support request [4]).  But the CVO is looking for its last completed version [5], and assuming (falsely) that the Upgradeable=False on the kube-apiserver's ClusterOperator object is a 4.7 kube-apiserver trying to keep the cluster-operator off 4.8.  We'll need a new bug for this angle.

So anyhow lots of CVO-side context there, but the only improvement I see that could sit on the kube-apiserver side is including some rolled up version information in the ClusterOperator condition message.  Does that sound sufficient to you, Lala?

[1]: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199
[2]: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199/files#diff-22001281e3b968448f2558fd87069f7dbe886ce349047d0270433e17ece4372aR179
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1992680#c14
[4]: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199/files#diff-22001281e3b968448f2558fd87069f7dbe886ce349047d0270433e17ece4372aR37-R41
[5]: https://github.com/openshift/cluster-version-operator/blob/c20e4d8a6cd8fe7f9cee5e05dc232f11c5b09ca8/pkg/payload/precondition/clusterversion/upgradeable.go#L114-L120

Comment 2 W. Trevor King 2021-10-29 03:20:20 UTC
(In reply to W. Trevor King from comment #1)
> * The CVO is getting confused by a mid-update retarget...

I've filed bug 2018368 to track this aspect.  While working up that bug, I noticed another issue on the ClusterOperator side that will impact kube-apiserver.  You might want to idle this until we get some updates-team consensus around the plan I've floated for bug 2018368, before doing the little dance I floated about limiting Upgradeable on your ClusterOperator based on comparing your current operator-code version with your reported status.versions[name=operator].

Comment 3 Lalatendu Mohanty 2021-10-29 16:17:17 UTC
> Does that sound sufficient to you, Lala?

Yes and thanks for raising the bug https://bugzilla.redhat.com/show_bug.cgi?id=2018368

Comment 4 Michal Fojtik 2021-12-20 10:34:04 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 5 Constantin Vultur 2022-02-16 14:20:51 UTC
Hit the same message while running upgrade from 4.9.9 ( stable-4.9 ) to 4.10.0-fc2 ( candidate-4.10 ):

Cluster operator kube-apiserver should not be upgraded between minor versions: KubeletMinorVersionUpgradeable: Kubelet minor versions on 5 nodes will not be supported in the next OpenShift minor version upgrade


Cluster did upgrade successfully, without any error

# oc get nodes
NAME         STATUS   ROLES    AGE   VERSION
master-0-0   Ready    master   29d   v1.23.0+60f5a1c
master-0-1   Ready    master   29d   v1.23.0+60f5a1c
master-0-2   Ready    master   29d   v1.23.0+60f5a1c
worker-0-0   Ready    worker   29d   v1.23.0+60f5a1c
worker-0-1   Ready    worker   29d   v1.23.0+60f5a1c
# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-fc.2   True        False         76m     Cluster version is 4.10.0-fc.2

Comment 6 Michal Dekan 2022-03-23 10:24:50 UTC
Hit the same message when updating from 4.9.21 (fast-4.9) to 4.10.5 (fast-4.10)
https://console-openshift-console.apps.ocp-example.host.com/settings/cluster/clusteroperators?rowFilter-cluster-operator-status=Cannot+update


kube-apiserver Cannot update	4.10.5	KubeletMinorVersionUpgradeable: Kubelet minor versions on 8 nodes will not be supported in the next OpenShift minor version upgrade.
machine-config Cannot update	4.10.5	One or more machine config pools are updating, please see `oc get mcp` for further details

oc get nodes
NAME                             STATUS   ROLES    AGE    VERSION
control-013.example.host.com     Ready    master   190d   v1.23.3+e419edf
control-014.example.host.com     Ready    master   190d   v1.23.3+e419edf
control-015.example.host.com     Ready    master   190d   v1.23.3+e419edf
control-016.example.host.com     Ready    infra    110d   v1.23.3+e419edf
control-017.example.host.com     Ready    infra    110d   v1.23.3+e419edf
control-018.example.host.com     Ready    infra    110d   v1.23.3+e419edf
worker-cnv-001.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-002.example.host.com  Ready    worker   190d   v1.22.3+fdba464
worker-cnv-003.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-004.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-005.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-006.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-007.example.host.com  Ready    worker   190d   v1.22.3+fdba464
worker-cnv-008.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-009.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-010.example.host.com  Ready    worker   190d   v1.22.3+fdba464
worker-cnv-011.example.host.com  Ready    worker   190d   v1.22.3+fdba464
worker-cnv-012.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-013.example.host.com  Ready    worker   190d   v1.22.3+fdba464
worker-cnv-014.example.host.com  Ready    worker   190d   v1.22.3+fdba464
worker-cnv-015.example.host.com  Ready    worker   190d   v1.22.3+fdba464
worker-cnv-016.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-017.example.host.com  Ready    worker   190d   v1.23.3+e419edf
worker-cnv-018.example.host.com  Ready    worker   109d   v1.23.3+e419edf
worker-cnv-019.example.host.com  Ready    worker   34d    v1.23.3+e419edf

Comment 8 mykarein 2022-05-14 10:58:17 UTC Comment hidden (spam)
Comment 10 Michal Fojtik 2022-08-18 14:26:03 UTC
Dear reporter, 

As part of the migration of all OpenShift bugs to Red Hat Jira, we are evaluating all bugs which will result in some stale issues or those without high or urgent priority to be closed. If you believe this bug still requires engineering resolution, we kindly ask you to follow this link[1] and continue working with us in Jira by recreating the issue and providing the necessary information. Also, please provide the link to the original Bugzilla in the description.

To create an issue, follow this link:

[1] https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1&priority=10300&components=12367637


Note You need to log in before you can comment on or make changes to this bug.