Bug 1986453

Summary: EUS Control loop to check for API server and node versions skew
Product: OpenShift Container Platform Reporter: Qi Wang <qiwan>
Component: NodeAssignee: Qi Wang <qiwan>
Node sub component: Kubelet QA Contact: MinLi <minmli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs
Version: 4.9   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:42:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qi Wang 2021-07-27 15:09:13 UTC
Description of problem:

Create a EUS Control loop to check for API server and node versions skew. If find pools that are greater than the n-2 skew, then emit an event with the message to warn the user to upgrade the node to kube-apiserver supported version.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 MinLi 2021-08-16 09:36:15 UTC
Hi, Qi
Can you tell me how to pause the worker MachineConfigPool before upgrade?

Comment 3 Qi Wang 2021-08-16 17:22:50 UTC
(In reply to MinLi from comment #2)
> Hi, Qi
> Can you tell me how to pause the worker MachineConfigPool before upgrade?

Run `oc edit machineconfigpool/worker` and set the paused: true or `oc patch mcp/worker paused=true`

Comment 4 MinLi 2021-08-18 09:41:42 UTC
Hi, Qi

I test this bug as the following steps, but don't find an event to warn the user to upgrade the node to kube-apiserver supported version. 
Please help to check: 

1.create a 4.6 nightly build cluster 
2.$oc edit mcp worker, and set "paused: true"
3.upgrade the cluster to 4.7.24, and succeed
4.upgrade the cluster to 4.8.5, and succeed
5.check many ClusterOperator description, such as config-operator($ oc describe co config-operator), kube-apiserver, kube-controller-manager, machine-api, machine-config, etc, but don't see tips like "kubelet version skew"
6.$ oc get events -A | grep "an unsupported kubelet version skew"  , the output is empty

And when I set "paused: false" for mcp worker, then the kubelet version roll out to the same with API server.

Comment 5 Qi Wang 2021-08-18 14:49:15 UTC
(In reply to MinLi from comment #4)
> Hi, Qi
> 
> I test this bug as the following steps, but don't find an event to warn the
> user to upgrade the node to kube-apiserver supported version. 
> Please help to check: 
> 
> 1.create a 4.6 nightly build cluster 
> 2.$oc edit mcp worker, and set "paused: true"
> 3.upgrade the cluster to 4.7.24, and succeed
> 4.upgrade the cluster to 4.8.5, and succeed
> 5.check many ClusterOperator description, such as config-operator($ oc
> describe co config-operator), kube-apiserver, kube-controller-manager,
> machine-api, machine-config, etc, but don't see tips like "kubelet version
> skew"
> 6.$ oc get events -A | grep "an unsupported kubelet version skew"  , the
> output is empty
> 
> And when I set "paused: false" for mcp worker, then the kubelet version roll
> out to the same with API server.

Sorry, I forget to mention the wording is different when the version difference is 2, compared with greater than 2.

Pause the pool at 4.6, and upgrade to 4.8, the version difference is 2 (kubelet 1.19 and kube-apiserver is 1.21, I haven't checked), the status is KubeletSkewPresent, like:

$ oc describe ClusterOperator
Spec:
Status:
 .....
 Message:               "Current kubelet version 1.19 will not be supported by newer kube-apiserver. Please upgrade the kubelet first if plan to upgrade the kube-apiserver.
    Reason:                KubeletSkewPresent

You can upgrade to 4.9 to see the KubeletSkewUnsupported, if the kube-apiserver is 1.22. 
On the current 4.8.5 cluster, can you check if the above KubeletSkewPresent appears in the ClusterOperator? and upgrade to 4.9 to see if there is the KubeletSkewUnsupported warning? I think the kube-apiserver version of 4.9 is upgraded to 1.22 now.

Comment 6 MinLi 2021-08-23 12:05:49 UTC
Pause the pool at 4.6, and upgrade to 4.8, kubelet 1.19 and kube-apiserver is 1.21, I can't find relevant message: 
[lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
[lyman@localhost env]$ 


Pause the pool at 4.6, and upgrade to 4.9, kubelet 1.19 and kube-apiserver is 1.22, find message like this:
[lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
    Message:               One or more nodes have an unsupported kubelet version skew. Please see `oc get nodes` for details and upgrade all nodes so that they have a kubelet version of at least 1.20.0.
    Reason:                KubeletSkewUnsupported

@Qi Wang, is this expected? when the version skew is 2, will not prompt relevant message.

Comment 7 Qi Wang 2021-08-23 12:55:20 UTC
(In reply to MinLi from comment #6)
> Pause the pool at 4.6, and upgrade to 4.8, kubelet 1.19 and kube-apiserver
> is 1.21, I can't find relevant message: 
> [lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
> [lyman@localhost env]$ 
> 
> 
> Pause the pool at 4.6, and upgrade to 4.9, kubelet 1.19 and kube-apiserver
> is 1.22, find message like this:
> [lyman@localhost env]$ oc describe ClusterOperator | grep -i Skew
>     Message:               One or more nodes have an unsupported kubelet
> version skew. Please see `oc get nodes` for details and upgrade all nodes so
> that they have a kubelet version of at least 1.20.0.
>     Reason:                KubeletSkewUnsupported
> 
> @Qi Wang, is this expected? when the version skew is 2, will not prompt
> relevant message.

Yes, that's expected. I just realized that the version skew check only starts from openshift 4.9. The version skew is 2, but it's on 4.8 cluster and the version has not been checked.

Comment 8 MinLi 2021-08-24 03:25:29 UTC
verified. 
upgrade path:4.6.0-0.nightly-2021-08-22-084748 -> 4.7.0-0.nightly-2021-08-21-153346 -> 4.8.0-0.nightly-2021-08-22-035234 -> 4.9.0-0.nightly-2021-08-22-0704054.6.0-0.nightly-2021-08-22-084748 -> 4.7.0-0.nightly-2021-08-21-153346 -> 4.8.0-0.nightly-2021-08-22-035234 -> 4.9.0-0.nightly-2021-08-22-070405

Comment 11 errata-xmlrpc 2021-10-18 17:42:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759