Bug 1993922 - The kubeletconfig controller has wrong assumption regarding the number of kubelet configs
Summary: The kubeletconfig controller has wrong assumption regarding the number of kub...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.9
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.9.0
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks: 2000958
TreeView+ depends on / blocked
 
Reported: 2021-08-16 12:19 UTC by Artyom
Modified: 2022-05-16 06:43 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:46:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2752 0 None None None 2021-09-02 18:40:26 UTC
Red Hat Issue Tracker RFE-2078 0 None None None 2021-08-16 19:05:14 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:46:41 UTC

Description Artyom 2021-08-16 12:19:09 UTC
Description of problem:
The environment has only 5 kubelet configs, but when I am trying to create an additional kubeletconifg I got an error:
max number of supported kubelet config (10) has been reached. Please delete old kubelet configs before retrying


Version-Release number of selected component (if applicable):
master

How reproducible:
Always

Steps to Reproduce:
1. Create 10 Kubeletconfigs
2. Delete kubeletconfigs with the low suffix(machineconfiguration.openshift.io/mc-name-suffix:0)
3. Try to create an additional kubeletconfig
4. Check generated machine configs

Actual results:
Got an error max number of supported kubelet config (10) has been reached. Please delete old kubelet configs before retrying

Expected results:
The controller should succeed to create a machine-config.

Additional info:
The problem is that the code does not assume that some of kubeletconfig can be deleted.

Relevant code:
// If we are here, this means that a new kubelet config was created, so we have to calculate the suffix value for its MC name
	suffixNum := 0
	// Go through the list of kubelet config objects created and get the max suffix value currently created
	for _, item := range kcList.Items {
		val, ok := item.GetAnnotations()[ctrlcommon.MCNameSuffixAnnotationKey]
		if ok {
			// Convert the suffix value to int so we can look through the list and grab the max suffix created so far
			intVal, err := strconv.Atoi(val)
			if err != nil {
				return "", fmt.Errorf("error converting %s to int: %v", val, err)
			}
			if intVal > suffixNum {
				suffixNum = intVal
			}
		}
	}
	// The max suffix value that we can go till with this logic is 9 - this means that a user can create up to 10 different kubelet config CRs.
	// However, if there is a kc-1 mapping to mc-1 and kc-2 mapping to mc-2 and the user deletes kc-1, it will delete mc-1 but
	// then if the user creates a kc-new it will map to mc-3. This is what we want as the latest kubelet config created should be higher in priority
	// so that those changes can be rolled out to the nodes. But users will have to be mindful of how many kubelet config CRs they create. Don't think
	// anyone should ever have the need to create 10 when they can simply update an existing kubelet config unless it is to apply to another pool.
	if suffixNum+1 > 9 {
		return "", fmt.Errorf("max number of supported kubelet config (10) has been reached. Please delete old kubelet configs before retrying")
	}

Comment 1 Dave Cain 2021-08-16 13:23:12 UTC
Are we not able to have more than (6) individual, unique MCPs each with a PAO (Performance Add On Operator) profile?

I am able to trigger this bug as well in my environment, with adding (2) additional MCPs / PAOs.  I had (4) before this.  Output from before creating new MCPs/PAOs:

$ oc get mc | grep kube
01-master-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
01-worker-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
99-ran-du-fec2-smci00-generated-kubelet-7                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             4d9h
99-ran-du-fec3-dell03-generated-kubelet-8                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             3d20h
99-ran-du-ldc1-smci01-generated-kubelet-5                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             6d11h
99-ran-du-ldc1-smci02-generated-kubelet-6                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             4d15h

$ oc get kubeletconfig
NAME                                      AGE
performance-ran-du-fec2-smci00-profile0   4d9h
performance-ran-du-fec3-dell03-profile0   3d20h
performance-ran-du-ldc1-smci01-profile0   6d11h
performance-ran-du-ldc1-smci02-profile0   4d15h

After creating (1) additional MCPs / PAOs, one called "ran-du-fec4-dell10", looks OK:

$ oc get mc | grep kubelet
01-master-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
01-worker-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
99-ran-du-fec2-smci00-generated-kubelet-7                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             4d10h
99-ran-du-fec3-dell03-generated-kubelet-8                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             3d22h
99-ran-du-fec4-dell10-generated-kubelet-9                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             3s
99-ran-du-ldc1-smci01-generated-kubelet-5                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             6d12h
99-ran-du-ldc1-smci02-generated-kubelet-6                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             4d16h

$ oc get kubeletconfig
NAME                                      AGE
performance-ran-du-fec2-smci00-profile0   4d11h
performance-ran-du-fec3-dell03-profile0   3d22h
performance-ran-du-fec4-dell10-profile0   5s
performance-ran-du-ldc1-smci01-profile0   6d13h
performance-ran-du-ldc1-smci02-profile0   4d16h

After creating (1) additional MCP / PAO, one called "ran-du-fec5-dell11", I notice there's no new generated MC (no fec5):
$ oc get mc | grep kubelet
01-master-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
01-worker-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
99-ran-du-fec2-smci00-generated-kubelet-7                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             4d10h
99-ran-du-fec3-dell03-generated-kubelet-8                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             3d22h
99-ran-du-fec4-dell10-generated-kubelet-9                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             3m9s
99-ran-du-ldc1-smci01-generated-kubelet-5                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             6d12h
99-ran-du-ldc1-smci02-generated-kubelet-6                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             4d16h

There is a new kubeletconfig:

$ oc get kubeletconfig
NAME                                      AGE
performance-ran-du-fec2-smci00-profile0   4d11h
performance-ran-du-fec3-dell03-profile0   3d22h
performance-ran-du-fec4-dell10-profile0   55s
performance-ran-du-fec5-dell11-profile0   4s
performance-ran-du-ldc1-smci01-profile0   6d13h
performance-ran-du-ldc1-smci02-profile0   4d16h

However, the new PAO exhibits the problem described in the summary:
$ oc describe performanceprofiles.performance.openshift.io ran-du-fec5-dell11-profile0
Name:         ran-du-fec5-dell11-profile0
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  performance.openshift.io/v2
Kind:         PerformanceProfile
Metadata:
  Creation Timestamp:  2021-08-16T13:19:25Z
  Finalizers:
    foreground-deletion
  Generation:  1
  Managed Fields:
    API Version:  performance.openshift.io/v2
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:cpu:
          .:
          f:isolated:
          f:reserved:
        f:hugepages:
          .:
          f:defaultHugepagesSize:
          f:pages:
        f:net:
          .:
          f:devices:
          f:userLevelNetworking:
        f:nodeSelector:
          .:
          f:node-role.kubernetes.io/ran-du-fec5-dell11:
        f:numa:
          .:
          f:topologyPolicy:
        f:realTimeKernel:
          .:
          f:enabled:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2021-08-16T13:19:25Z
    API Version:  performance.openshift.io/v2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"foreground-deletion":
      f:status:
        .:
        f:conditions:
        f:runtimeClass:
        f:tuned:
    Manager:         performance-operator
    Operation:       Update
    Time:            2021-08-16T13:19:25Z
  Resource Version:  15845626
  UID:               92c13f6c-797b-405b-8419-d7515d86481d
Spec:
  Cpu:
    Isolated:  3-31,35-63
    Reserved:  0-2,32-34
  Hugepages:
    Default Hugepages Size:  1G
    Pages:
      Count:  16
      Node:   0
      Size:   1G
  Net:
    Devices:
      Interface Name:       ens3f0
      Interface Name:       ens3f1
      Interface Name:       ens4f0
      Interface Name:       ens4f1
      Interface Name:       ens4f2
      Interface Name:       ens4f3
      Interface Name:       ens4f4
      Interface Name:       ens4f5
      Interface Name:       ens4f6
      Interface Name:       ens4f7
      Interface Name:       eno8303np0
      Interface Name:       eno8403np1
      Interface Name:       eno8503np2
      Interface Name:       eno8603np3
    User Level Networking:  true
  Node Selector:
    node-role.kubernetes.io/ran-du-fec5-dell11:  
  Numa:
    Topology Policy:  best-effort
  Real Time Kernel:
    Enabled:  true
Status:
  Conditions:
    Last Heartbeat Time:   2021-08-16T13:19:53Z
    Last Transition Time:  2021-08-16T13:19:53Z
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2021-08-16T13:19:53Z
    Last Transition Time:  2021-08-16T13:19:53Z
    Status:                False
    Type:                  Upgradeable
    Last Heartbeat Time:   2021-08-16T13:19:53Z
    Last Transition Time:  2021-08-16T13:19:53Z
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2021-08-16T13:19:53Z
    Last Transition Time:  2021-08-16T13:19:53Z
    Message:               could not get kubelet config key: max number of supported kubelet config (10) has been reached. Please delete old kubelet configs before retrying
    Reason:                KubeletConfig failure
    Status:                True
    Type:                  Degraded
  Runtime Class:           performance-ran-du-fec5-dell11-profile0
  Tuned:                   openshift-cluster-node-tuning-operator/openshift-node-performance-ran-du-fec5-dell11-profile0
Events:
  Type    Reason              Age                From                            Message
  ----    ------              ----               ----                            -------
  Normal  Creation succeeded  0s (x13 over 89s)  performance-profile-controller  Succeeded to create all components

Any workarounds?

Comment 2 Dave Cain 2021-08-16 19:05:15 UTC
I attempted a workaround in my environment as suggested by Artyom:

1. scale down the PAO Operator replicas to 0
2. Pause all MCPs
3. Delete all kubeletconfigs created by PAO
4. scale up the PAO replicas to 1
5. wait until it will re-create all kubeletconfigs
6. unpause all MCPs

This reset the suffix numbering to 0, which unblocked me: 
$ oc get mc | grep kubelet
01-master-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
01-worker-kubelet                                              29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             18d
99-ran-du-fec2-smci00-generated-kubelet-3                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             5m9s
99-ran-du-fec3-dell03-generated-kubelet                        29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             5m12s
99-ran-du-ldc1-smci01-generated-kubelet-1                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             5m12s
99-ran-du-ldc1-smci02-generated-kubelet-2                      29813c845a4a3ee8e6856713c585aca834e0bf1e   3.2.0             5m10s

but still, there will be environments where we may have almost 500 different MCPs/PAOs given different hardware types and configurations.  This should really be scalable beyond 9 on a per cluster basis.

Comment 5 Urvashi Mohnani 2021-08-30 21:11:28 UTC
Hi,

When you create a kubelet config, a corresponding machine config (MC) is created for it and is applied to the nodes matching the specified pool. MCs are applied in alphanumeric order, so if I have mc-1 and mc-2, mc-2 will take priority and you will only see the configuration from mc-2 applied to your nodes. That is why, the number of kubeletconfigs is limited as the max suffix the corresponding MC can go to is "-9". We have documented that when you hit the limit of 10 and need to create more kubeletconfigs, you need to delete your kubeletconfigs in reverse order, i.e from most recent to oldest. So you will delete the kubeletconfig that created the "kubelet-mc-9" MC and so forth. This is how MCs and kubeletconfigs are designed and the same is true for containerruntimeconfigs.

You can find the documentation on this here https://docs.openshift.com/container-platform/4.7/post_installation_configuration/machine-configuration-tasks.html#create-a-kubeletconfig-crd-to-edit-kubelet-parameters_post-install-machine-configuration-tasks.

Needing to have 500 different configurations for one cluster seems like a very rare case, a possible workaround could be to create separate Machine Configs to directly change the kubelet.conf files on the nodes as kubeletconfig has a limit of 10.

Comment 6 Dave Cain 2021-08-31 01:08:20 UTC
(In reply to Urvashi Mohnani from comment #5)
> When you create a kubelet config, a corresponding machine config (MC) is
> created for it and is applied to the nodes matching the specified pool. MCs
> are applied in alphanumeric order, so if I have mc-1 and mc-2, mc-2 will
> take priority and you will only see the configuration from mc-2 applied to
> your nodes. That is why, the number of kubeletconfigs is limited as the max
> suffix the corresponding MC can go to is "-9". We have documented that when
> you hit the limit of 10 and need to create more kubeletconfigs, you need to
> delete your kubeletconfigs in reverse order, i.e from most recent to oldest.
> So you will delete the kubeletconfig that created the "kubelet-mc-9" MC and
> so forth. This is how MCs and kubeletconfigs are designed and the same is
> true for containerruntimeconfigs.

Does the Performance Add On Operator know how to handle this?

Comment 7 Artyom 2021-08-31 09:09:36 UTC
Not really, we can do it, but IMHO it should be supported by the KubeletConfig controller. Like I do not see a reason for such limitation. An additional thing, it can limit it by 10 for each MCP and not for all of them.

Comment 10 Sunil Choudhary 2021-09-06 16:10:17 UTC
Checked on 4.9.0-0.nightly-2021-09-06-055314, created 10 kubeletconfig.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-06-055314   True        False         5h6m    Cluster version is 4.9.0-0.nightly-2021-09-06-055314

$ oc get kubeletconfig
NAME              AGE
set-max-pods-1    4h56m
set-max-pods-10   14m
set-max-pods-2    4h49m
set-max-pods-3    4h41m
set-max-pods-4    4h35m
set-max-pods-5    4h27m
set-max-pods-6    138m
set-max-pods-7    122m
set-max-pods-8    85m
set-max-pods-9    30m

$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
00-worker                                          2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
01-master-container-runtime                        2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
01-master-kubelet                                  2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
01-worker-container-runtime                        2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
01-worker-kubelet                                  2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
99-master-generated-registries                     2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
99-master-ssh                                                                                 3.2.0             5h31m
99-worker-generated-kubelet                        2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h56m
99-worker-generated-kubelet-1                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h49m
99-worker-generated-kubelet-2                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h41m
99-worker-generated-kubelet-3                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h35m
99-worker-generated-kubelet-4                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h27m
99-worker-generated-kubelet-5                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             138m
99-worker-generated-kubelet-6                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             122m
99-worker-generated-kubelet-7                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             85m
99-worker-generated-kubelet-8                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             30m
99-worker-generated-kubelet-9                      2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             14m
99-worker-generated-registries                     2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
99-worker-ssh                                                                                 3.2.0             5h31m
rendered-master-2e7cd6479c24109f2e0f5d021c69d103   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
rendered-worker-0fc3dd7ff9b7d0204f320d9d303a7c9f   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h35m
rendered-worker-48fd4be36648379fbb4eb1665b9cbf00   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h56m
rendered-worker-4cf6bbd9199d7589b055ddee88619a8e   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             30m
rendered-worker-5e90bda2302d7ef2077d3d4d4833cd72   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h27m
rendered-worker-80b39c8e9894054014bbd4d036b12d73   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             85m
rendered-worker-86113f9fcec81bf6fac7785e8df98168   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             14m
rendered-worker-88d901113a41e0c32c0514c7af3964f6   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             122m
rendered-worker-95f00be90ae402a404a50286ad371f9b   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             138m
rendered-worker-980cc08b06917c8c82c33254181f0092   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             5h28m
rendered-worker-ac354bf54e90d30b86457b82a5e5bed3   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h41m
rendered-worker-d7a533c322c051fcb4066552be840661   2ec816a4aa741821e664fa512ab02f465926c0ab   3.2.0             4h49m

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-2e7cd6479c24109f2e0f5d021c69d103   True      False      False      3              3                   3                     0                      5h30m
worker   rendered-worker-86113f9fcec81bf6fac7785e8df98168   True      False      False      3              3                   3                     0                      5h30m

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-156-0.us-east-2.compute.internal     Ready    master   5h31m   v1.22.0-rc.0+75ee307
ip-10-0-157-43.us-east-2.compute.internal    Ready    worker   5h24m   v1.22.0-rc.0+75ee307
ip-10-0-160-231.us-east-2.compute.internal   Ready    master   5h30m   v1.22.0-rc.0+75ee307
ip-10-0-189-205.us-east-2.compute.internal   Ready    worker   5h24m   v1.22.0-rc.0+75ee307
ip-10-0-197-226.us-east-2.compute.internal   Ready    worker   5h24m   v1.22.0-rc.0+75ee307
ip-10-0-201-4.us-east-2.compute.internal     Ready    master   5h30m   v1.22.0-rc.0+75ee307

Comment 13 errata-xmlrpc 2021-10-18 17:46:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.