Bug 1995621 - KubeletConfig not applied to MachineConfigSet
Summary: KubeletConfig not applied to MachineConfigSet
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
: 4.7.z
Assignee: Qi Wang
QA Contact: MinLi
URL:
Whiteboard:
Depends On: 2000958
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-19 14:04 UTC by Mangirdas Judeikis
Modified: 2021-10-27 08:23 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-27 08:22:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2759 0 None open [release-4.7] Bug 1995621: Backport fix 1 to 1 relation to kubelet configs and machine config pools 2021-09-30 12:47:46 UTC
Red Hat Product Errata RHBA-2021:3931 0 None None None 2021-10-27 08:23:18 UTC

Description Mangirdas Judeikis 2021-08-19 14:04:59 UTC
Description of problem:

KubeletConfig are not reconciling MachineConfigPool changes.


Version-Release number of selected component (if applicable):

Server Version: 4.7.21


How reproducible:

1. Create a cluster
2. Add label to "worker" MachineConfigPool:
aro.openshift.io/limits: ""

3. Create customer KubeletConfig:

apiVersion: machineconfiguration.openshift.io/v1                                                                                                                                                                                                                              
kind: KubeletConfig                                                                                                                                                                                                                                                           
metadata:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
  generation: 1                                                                                                                                                                                                                                                               
  labels:                                                                                                                                                                                                                                                                     
    aro.openshift.io/limits: "" 
  name: aro-limits
spec:
    kubeletConfig:
      evictionHard:
        imagefs.available: 15%
        memory.available: 500Mi
        nodefs.available: 10%
        nodefs.inodesFree: 5%
      systemReserved:
        memory: 2000Mi
    machineConfigPoolSelector:
      matchLabels:
        aro.openshift.io/limits: ""


Wait for it to be applied (nodes rotates).

4. Add same label to master MachineConfigPool.
aro.openshift.io/limits: ""

5. New MachineConfig is not generated.

6. Try updating KubeletConfig with arbitrary changes, still nothing.

7. If you delete and re-create Kubeletconfig it is applied.

It is VERY disruptive to big cluster
Any changes to machineConfigPool should be acted by KubeletConfig controller

Expected result:
Edits or updates to both MachineConfigPool and KubeletConfig should trigger regeneration of MachineConfig

Comment 1 Qi Wang 2021-08-24 21:59:17 UTC
(In reply to Mangirdas Judeikis from comment #0)
> Description of problem:
> 
> KubeletConfig are not reconciling MachineConfigPool changes.
> 
> 
> Version-Release number of selected component (if applicable):
> 
> Server Version: 4.7.21
> 
> 
> How reproducible:
> 
> 1. Create a cluster
> 2. Add label to "worker" MachineConfigPool:
> aro.openshift.io/limits: ""
> 
> 3. Create customer KubeletConfig:
> 
> apiVersion: machineconfiguration.openshift.io/v1                            
> 
> kind: KubeletConfig                                                         
> 
> metadata:                                                                   
> 
>   generation: 1                                                             
> 
>   labels:                                                                   
> 
>     aro.openshift.io/limits: "" 
>   name: aro-limits
> spec:
>     kubeletConfig:
>       evictionHard:
>         imagefs.available: 15%
>         memory.available: 500Mi
>         nodefs.available: 10%
>         nodefs.inodesFree: 5%
>       systemReserved:
>         memory: 2000Mi
>     machineConfigPoolSelector:
>       matchLabels:
>         aro.openshift.io/limits: ""
> 
> 
> Wait for it to be applied (nodes rotates).
> 
> 4. Add same label to master MachineConfigPool.
> aro.openshift.io/limits: ""
> 
> 5. New MachineConfig is not generated.
> 
> 6. Try updating KubeletConfig with arbitrary changes, still nothing.
> 
> 7. If you delete and re-create Kubeletconfig it is applied.
> 
> It is VERY disruptive to big cluster
> Any changes to machineConfigPool should be acted by KubeletConfig controller
> 
> Expected result:
> Edits or updates to both MachineConfigPool and KubeletConfig should trigger
> regeneration of MachineConfig

What is the status of the nodes after step 6?  I tried to reproduce, after step 5, no new machineconfig generated as mentioned above. But after edit kuletconfig, a new machineconfig 99-master-generated-kubelet generated on my cluster, it leads one o the master nodes stuck at Ready,SchedulingDisabled status.
What is "still nothing" in detail, there's no machineconfig created after step 6?

Comment 2 Qi Wang 2021-08-24 22:17:18 UTC
@mjudeiki Could you also provide the must-gather logs?

Comment 3 Qi Wang 2021-09-03 20:31:22 UTC
Did not get completed this sprint. Waiting for responses.

Comment 4 Mangirdas Judeikis 2021-09-08 17:10:50 UTC
I don't have cluster alive anymore for this, so I can't provide an must-gather. But from what you wrote looks like you partially recreated it to the point there issue can be observed. 
I suspect node not ready is separate issue, not related to issue above.

Comment 5 Qi Wang 2021-09-12 01:47:16 UTC
Set up a PR to have new machineconfig generated after setting the label to master pool and editing the kubeletconfig.
Verified the kubeletconfig can be applied to both master pool and worker pool.

1.# Apply the kubeletconfig to worker pool, use example kubeletconfig from https://github.com/openshift/machine-config-operator/blob/master/examples/kubeletconfig.crd.yaml
$ oc label mcp worker custom-kubelet=small-pods
machineconfigpool.machineconfiguration.openshift.io/worker labeled
$ oc apply -f /home/qiwan/test-crds/kubeletconfig.yml
kubeletconfig.machineconfiguration.openshift.io/set-max-pods created
$ oc get mc
99-worker-generated-kubelet

2.# Tag master pool with the same label above and edit the kubeletconfig
$ oc label mcp master custom-kubelet=small-pods
machineconfigpool.machineconfiguration.openshift.io/master labeled
$ oc edit kubeletconfig/set-max-pods
kubeletconfig.machineconfiguration.openshift.io/set-max-pods edited
$ oc get mc
99-master-generated-kubelet
99-worker-generated-kubelet

3. Debug into the node check the /etc/kubernetes/kubelet.conf has been updated.

Comment 9 MinLi 2021-10-19 02:58:46 UTC
verified on 4.7.0-0.nightly-2021-10-15-152957, test steps as  Comment 5

Comment 12 errata-xmlrpc 2021-10-27 08:22:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.36 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3931


Note You need to log in before you can comment on or make changes to this bug.