Bug 2041814

Summary: The KubeletConfigController wrongly process multiple confs for a pool
Product: OpenShift Container Platform Reporter: MinLi <minmli>
Component: NodeAssignee: Qi Wang <qiwan>
Node sub component: Kubelet QA Contact: MinLi <minmli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: aos-bugs, dshumake, harpatil, nagrawal
Version: 4.7   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2076355 (view as bug list) Environment:
Last Closed: 2022-08-10 10:42:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2074225    

Description MinLi 2022-01-18 10:34:57 UTC
Description of problem:
The KubeletConfigController wrongly process multiple confs for a pool.
When creating 2 kubeletconfig, it will generate 3 machine config.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2022-01-17-032357

How reproducible:


Steps to Reproduce:
1.oc label mcp master custom-kubelet=test-pods

2.create the first kubeletconfig according to 
custom-kubelet-test1.yaml:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: max-pod
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: test-pods
  kubeletConfig:
    maxPods: 248

3.when the mcp finish rolling out, create the 2nd kubeletconfig according to 
custom-kubelet-test2.yaml:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: max-pod-1
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: test-pods
  kubeletConfig:
    maxPods: 222

4 wait the mcp finish rolling out, check mc 

5 login master node, check /etc/kubernetes/kubelet.conf

Actual results:
4 generate 3 mc, 
99-master-generated-kubelet                        51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             60m
99-master-generated-kubelet-1                      51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             46m
99-master-generated-kubelet-2                      51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             40m

5.it show: "maxPods": 248

And check the 

Expected results:
4 generate 2 mc
99-master-generated-kubelet                        51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             60m
99-master-generated-kubelet-1                      51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             46m

5.it show: "maxPods": 222

Additional info:
Indeed, when create the 2nd kubeletconfig, it generate 99-master-generated-kubelet-1 and 99-master-generated-kubelet-2. 

$ oc get kubeletconfig 
NAME        AGE
max-pod     60m
max-pod-1   46m

$ oc get mc 
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
00-worker                                          51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
01-master-container-runtime                        51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
01-master-kubelet                                  51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
01-worker-container-runtime                        51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
01-worker-kubelet                                  51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
99-master-generated-kubelet                        51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             60m
99-master-generated-kubelet-1                      51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             46m
99-master-generated-kubelet-2                      51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             40m
99-master-generated-registries                     51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
99-master-ssh                                                                                 3.2.0             25h
99-worker-generated-containerruntime               51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             24h
99-worker-generated-registries                     51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
99-worker-ssh                                                                                 3.2.0             25h
rendered-master-1375669cb657f1ef6a401299550a831c   51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             46m
rendered-master-5743264e708b5842f9919abc9a534dd1   51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             60m
rendered-master-9fffa2ab6b0a15367479712c1723353f   51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h
rendered-worker-1e9c22b7915857c4449d7ee5b90e4028   51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             24h
rendered-worker-3eb300f46cc18c53795739e5171012c4   51dc0801ed7d705820f557fcabf04eff023bf568   3.2.0             25h

Comment 1 MinLi 2022-01-18 10:36:38 UTC
$ oc logs -f machine-config-controller-6d5cf7dbc9-w4lng -n openshift-machine-config-operator
I0118 08:43:23.870051       1 start.go:50] Version: v4.7.0-202201082234.p0.g51dc080.assembly.stream-dirty (51dc0801ed7d705820f557fcabf04eff023bf568)
I0118 08:43:23.875080       1 leaderelection.go:243] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller...
I0118 08:45:19.709842       1 leaderelection.go:253] successfully acquired lease openshift-machine-config-operator/machine-config-controller
E0118 08:45:19.769731       1 template_controller.go:121] couldn't get ControllerConfig on secret callback &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"controllerconfig.machineconfiguration.openshift.io \"machine-config-controller\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc000a6d200), Code:404}}
I0118 08:45:19.937364       1 container_runtime_config_controller.go:185] Starting MachineConfigController-ContainerRuntimeConfigController
I0118 08:45:19.937333       1 kubelet_config_controller.go:156] Starting MachineConfigController-KubeletConfigController
I0118 08:45:20.140439       1 node_controller.go:152] Starting MachineConfigController-NodeController
I0118 08:45:20.140494       1 template_controller.go:183] Starting MachineConfigController-TemplateController
I0118 08:45:20.140864       1 render_controller.go:124] Starting MachineConfigController-RenderController
I0118 08:45:20.349436       1 kubelet_config_controller.go:575] Applied KubeletConfig max-pod-1 on MachineConfigPool master
I0118 08:45:20.946864       1 kubelet_config_controller.go:575] Applied KubeletConfig max-pod on MachineConfigPool master
I0118 08:47:26.484121       1 node_controller.go:419] Pool master: node minmli01174701-llg89-master-2.c.openshift-qe.internal: Reporting unready: node minmli01174701-llg89-master-2.c.openshift-qe.internal is reporting Unschedulable
I0118 08:47:40.706433       1 node_controller.go:419] Pool master: node minmli01174701-llg89-master-2.c.openshift-qe.internal: Completed update to rendered-master-5743264e708b5842f9919abc9a534dd1
I0118 08:47:40.734308       1 node_controller.go:419] Pool master: node minmli01174701-llg89-master-2.c.openshift-qe.internal: Reporting ready
I0118 08:47:45.706891       1 status.go:90] Pool master: All nodes are updated with rendered-master-5743264e708b5842f9919abc9a534dd1

Comment 3 MinLi 2022-01-20 07:36:14 UTC
This issue also hit 4.8, yet it didn't reproduce when apply to machine-config-pool worker. It seems only happen on mcp master.

Comment 4 Qi Wang 2022-01-28 21:44:38 UTC
Did some tests about this. I can reproduce it also on worker nodes.
Use the same kubeletconfig yaml files in the #Description: 


[qiwan@qiwan ~]$ oc get kubeletconfig
NAME        AGE
max-pod-1   18m
max-pod-2   13m

[qiwan@qiwan ~]$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
00-worker                                          c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
01-master-container-runtime                        c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
01-master-kubelet                                  c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
01-worker-container-runtime                        c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
01-worker-kubelet                                  c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
99-master-generated-registries                     c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
99-master-ssh                                                                                 3.2.0             142m
99-worker-generated-kubelet                        c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             11m
99-worker-generated-kubelet-1                      c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             6m30s
99-worker-generated-kubelet-2                      c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             36s
99-worker-generated-registries                     c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             137m
99-worker-ssh                                                                                 3.2.0             142m
rendered-master-054de7b98995263c96d5dd6a2e6dd69d   c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             11m
rendered-master-c4800fc1f9d8605a0a09c2223c4134d5   3e9f2ca58e00d5dd5a54b18fb5b00c5571b5c8e3   3.2.0             137m
rendered-master-c8f113baf5e0edbfc712c71419ae618b   3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0   3.2.0             108m
rendered-worker-17a5327f1131efbefd6e37e7fcf77f0a   3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0   3.2.0             17m
rendered-worker-2b1cea99aa958104038153ce242bcd0c   3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0   3.2.0             108m
rendered-worker-3cb30ae1c943187b362941403077c309   3e9f2ca58e00d5dd5a54b18fb5b00c5571b5c8e3   3.2.0             117m
rendered-worker-6058784b6b5e4476f0622834848f3832   3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0   3.2.0             104m
rendered-worker-7dd4db23a125800347de7b5bb11bdad0   c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             11m
rendered-worker-d5186c9ec9766128a859c5f2a9fa48e0   c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             6m25s
rendered-worker-d98df84416d9db4734259ee802f2a369   c2a8dc8e8731107f70279cfa720c13b499fdca15   3.2.0             10m
rendered-worker-e9bb9fc65a7fb7f320a9eecc7ae340f1   3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0   3.2.0             101m
rendered-worker-f53880ebafa7b07cfd8f0543e65c8419   3e9f2ca58e00d5dd5a54b18fb5b00c5571b5c8e3   3.2.0             137m

[qiwan@qiwan ~]$ oc describe kubeletconfig/max-pod-1
Name:         max-pod-1
Namespace:    
Labels:       <none>
Annotations:  machineconfiguration.openshift.io/mc-name-suffix: 2
API Version:  machineconfiguration.openshift.io/v1
Kind:         KubeletConfig
Metadata:
  Creation Timestamp:  2022-01-28T21:04:42Z
  Finalizers:
    99-worker-generated-kubelet
    99-worker-generated-kubelet-2
  Generation:  1
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
      f:spec:
        .:
        f:kubeletConfig:
          .:
          f:maxPods:
        f:machineConfigPoolSelector:
          .:
          f:matchLabels:
            .:
            f:custom-kubelet-worker:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2022-01-28T21:04:42Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
    Manager:      machine-config-controller
    Operation:    Update
    Subresource:  status
    Time:         2022-01-28T21:04:43Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:machineconfiguration.openshift.io/mc-name-suffix:
        f:finalizers:
          .:
          v:"99-worker-generated-kubelet":
          v:"99-worker-generated-kubelet-2":
    Manager:         machine-config-controller
    Operation:       Update
    Time:            2022-01-28T21:15:12Z
  Resource Version:  87484
  UID:               80977f7c-3238-4162-b98f-a9c3b415b94c

// After max-pod-2 has been rolled out, wait for several minutes and checked the log to see the kubeletconfig resync
[qiwan@qiwan ~]$ oc logs -f machine-config-controller-5f9bd97d4f-c7xft
I0128 21:12:30.444264       1 start.go:50] Version: machine-config-daemon-4.6.0-202006240615.p0-1231-gc2a8dc8e (c2a8dc8e8731107f70279cfa720c13b499fdca15)
I0128 21:12:33.561471       1 leaderelection.go:248] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller...

I0128 21:15:11.364531       1 leaderelection.go:258] successfully acquired lease openshift-machine-config-operator/machine-config-controller
I0128 21:15:11.402650       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
I0128 21:15:11.469686       1 node_controller.go:152] Starting MachineConfigController-NodeController
I0128 21:15:11.472877       1 kubelet_config_controller.go:169] Starting MachineConfigController-KubeletConfigController
I0128 21:15:11.474188       1 container_runtime_config_controller.go:184] Starting MachineConfigController-ContainerRuntimeConfigController
I0128 21:15:11.474369       1 kubelet_config_controller.go:446] sync kubbeletconfig: key: max-pod-1time; 2022-01-28 21:15:11.474363961 +0000 UTC m=+161.155989663
I0128 21:15:11.474475       1 kubelet_config_controller.go:471] kubeleltc config name: max-pod-1
I0128 21:15:11.478355       1 kubelet_config_controller.go:446] sync kubbeletconfig: key: max-pod-2time; 2022-01-28 21:15:11.478348348 +0000 UTC m=+161.159974064
I0128 21:15:11.478446       1 kubelet_config_controller.go:471] kubeleltc config name: max-pod-2

I0128 21:15:11.575671       1 template_controller.go:238] Starting MachineConfigController-TemplateController
I0128 21:15:11.575951       1 render_controller.go:124] Starting MachineConfigController-RenderController
I0128 21:15:11.673755       1 kubelet_config_controller.go:632] Applied KubeletConfig max-pod-2 on MachineConfigPool worker
I0128 21:15:12.485615       1 kubelet_config_controller.go:632] Applied KubeletConfig max-pod-1 on MachineConfigPool worker
I0128 21:15:16.419086       1 node_controller.go:414] Pool worker: 2 candidate nodes for update, capacity: 1
I0128 21:15:16.419175       1 node_controller.go:414] Pool worker: Setting node ci-ln-l5nkrxk-72292-79kr7-worker-c-m7xhj target to rendered-worker-d5186c9ec9766128a859c5f2a9fa48e0
I0128 21:15:16.453658       1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"", Name:"worker", UID:"3a9d3790-cbd8-401d-9443-69f97dc0e619", APIVersion:"machineconfiguration.openshift.io/v1", ResourceVersion:"83282", FieldPath:""}): type: 'Normal' reason: 'SetDesiredConfig' Targeted node ci-ln-l5nkrxk-72292-79kr7-worker-c-m7xhj to config rendered-worker-d5186c9ec9766128a859c5f2a9fa48e0

Comment 7 MinLi 2022-02-28 09:59:54 UTC
verified!

$ oc get clusterversion 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-02-27-122819   True        False         146m    Cluster version is 4.11.0-0.nightly-2022-02-27-122819

$ oc get mc 
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
00-worker                                          4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
01-master-container-runtime                        4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
01-master-kubelet                                  4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
01-worker-container-runtime                        4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
01-worker-kubelet                                  4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
99-master-generated-kubelet                        4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             44m
99-master-generated-kubelet-1                      4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             25m
99-master-generated-registries                     4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
99-master-ssh                                                                                 3.2.0             112m
99-worker-generated-registries                     4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
99-worker-ssh                                                                                 3.2.0             112m
rendered-master-1d1f4e45e5ea50e22a8b7d729af00f03   4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             25m
rendered-master-6c05924f5ef7961a3857db00dee9a1fe   4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             44m
rendered-master-8f3d37cfd041071f334a29fe070778ff   4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m
rendered-worker-60525c576deef509b0f0644c942e989e   4e7fe38d0db4a3a542a246b6a9eb97f582b91b07   3.2.0             110m

Comment 8 Qi Wang 2022-04-19 15:00:19 UTC
*** Bug 2069764 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2022-08-10 10:42:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069