Bug 1986453 - EUS Control loop to check for API server and node versions skew
Summary: EUS Control loop to check for API server and node versions skew
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Qi Wang
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-27 15:09 UTC by Qi Wang
Modified: 2021-10-18 17:42 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:42:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2658 0 None open [OCPNODE-595] Check for API server and node versions skew 2021-07-27 15:09:12 UTC
Red Hat Issue Tracker OCPNODE-595 0 Unprioritized In Progress Create a EUS Control loop to check for API server and node versions skew 2021-07-27 15:15:37 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:42:28 UTC

Description Qi Wang 2021-07-27 15:09:13 UTC
Description of problem:

Create a EUS Control loop to check for API server and node versions skew. If find pools that are greater than the n-2 skew, then emit an event with the message to warn the user to upgrade the node to kube-apiserver supported version.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 MinLi 2021-08-16 09:36:15 UTC
Hi, Qi
Can you tell me how to pause the worker MachineConfigPool before upgrade?

Comment 3 Qi Wang 2021-08-16 17:22:50 UTC
(In reply to MinLi from comment #2)
> Hi, Qi
> Can you tell me how to pause the worker MachineConfigPool before upgrade?

Run `oc edit machineconfigpool/worker` and set the paused: true or `oc patch mcp/worker paused=true`

Comment 4 MinLi 2021-08-18 09:41:42 UTC
Hi, Qi

I test this bug as the following steps, but don't find an event to warn the user to upgrade the node to kube-apiserver supported version. 
Please help to check: 

1.create a 4.6 nightly build cluster 
2.$oc edit mcp worker, and set "paused: true"
3.upgrade the cluster to 4.7.24, and succeed
4.upgrade the cluster to 4.8.5, and succeed
5.check many ClusterOperator description, such as config-operator($ oc describe co config-operator), kube-apiserver, kube-controller-manager, machine-api, machine-config, etc, but don't see tips like "kubelet version skew"
6.$ oc get events -A | grep "an unsupported kubelet version skew"  , the output is empty

And when I set "paused: false" for mcp worker, then the kubelet version roll out to the same with API server.

Comment 5 Qi Wang 2021-08-18 14:49:15 UTC
(In reply to MinLi from comment #4)
> Hi, Qi
> 
> I test this bug as the following steps, but don't find an event to warn the
> user to upgrade the node to kube-apiserver supported version. 
> Please help to check: 
> 
> 1.create a 4.6 nightly build cluster 
> 2.$oc edit mcp worker, and set "paused: true"
> 3.upgrade the cluster to 4.7.24, and succeed
> 4.upgrade the cluster to 4.8.5, and succeed
> 5.check many ClusterOperator description, such as config-operator($ oc
> describe co config-operator), kube-apiserver, kube-controller-manager,
> machine-api, machine-config, etc, but don't see tips like "kubelet version
> skew"
> 6.$ oc get events -A | grep "an unsupported kubelet version skew"  , the
> output is empty
> 
> And when I set "paused: false" for mcp worker, then the kubelet version roll
> out to the same with API server.

Sorry, I forget to mention the wording is different when the version difference is 2, compared with greater than 2.

Pause the pool at 4.6, and upgrade to 4.8, the version difference is 2 (kubelet 1.19 and kube-apiserver is 1.21, I haven't checked), the status is KubeletSkewPresent, like:

$ oc describe ClusterOperator
Spec:
Status:
 .....
 Message:               "Current kubelet version 1.19 will not be supported by newer kube-apiserver. Please upgrade the kubelet first if plan to upgrade the kube-apiserver.
    Reason:                KubeletSkewPresent

You can upgrade to 4.9 to see the KubeletSkewUnsupported, if the kube-apiserver is 1.22. 
On the current 4.8.5 cluster, can you check if the above KubeletSkewPresent appears in the ClusterOperator? and upgrade to 4.9 to see if there is the KubeletSkewUnsupported warning? I think the kube-apiserver version of 4.9 is upgraded to 1.22 now.

Comment 6 MinLi 2021-08-23 12:05:49 UTC
Pause the pool at 4.6, and upgrade to 4.8, kubelet 1.19 and kube-apiserver is 1.21, I can't find relevant message: 
[lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
[lyman@localhost env]$ 


Pause the pool at 4.6, and upgrade to 4.9, kubelet 1.19 and kube-apiserver is 1.22, find message like this:
[lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
    Message:               One or more nodes have an unsupported kubelet version skew. Please see `oc get nodes` for details and upgrade all nodes so that they have a kubelet version of at least 1.20.0.
    Reason:                KubeletSkewUnsupported

@Qi Wang, is this expected? when the version skew is 2, will not prompt relevant message.

Comment 7 Qi Wang 2021-08-23 12:55:20 UTC
(In reply to MinLi from comment #6)
> Pause the pool at 4.6, and upgrade to 4.8, kubelet 1.19 and kube-apiserver
> is 1.21, I can't find relevant message: 
> [lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
> [lyman@localhost env]$ 
> 
> 
> Pause the pool at 4.6, and upgrade to 4.9, kubelet 1.19 and kube-apiserver
> is 1.22, find message like this:
> [lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
>     Message:               One or more nodes have an unsupported kubelet
> version skew. Please see `oc get nodes` for details and upgrade all nodes so
> that they have a kubelet version of at least 1.20.0.
>     Reason:                KubeletSkewUnsupported
> 
> @Qi Wang, is this expected? when the version skew is 2, will not prompt
> relevant message.

Yes, that's expected. I just realized that the version skew check only starts from openshift 4.9. The version skew is 2, but it's on 4.8 cluster and the version has not been checked.

Comment 8 MinLi 2021-08-24 03:25:29 UTC
verified. 
upgrade path:4.6.0-0.nightly-2021-08-22-084748 -> 4.7.0-0.nightly-2021-08-21-153346 -> 4.8.0-0.nightly-2021-08-22-035234 -> 4.9.0-0.nightly-2021-08-22-0704054.6.0-0.nightly-2021-08-22-084748 -> 4.7.0-0.nightly-2021-08-21-153346 -> 4.8.0-0.nightly-2021-08-22-035234 -> 4.9.0-0.nightly-2021-08-22-070405

Comment 11 errata-xmlrpc 2021-10-18 17:42:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.