Hide Forgot
Description of problem: Original BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1829116 Currently the KubeletConfigController operates as follows: When the controller receives a add/update/delete operation: - if add/update, re-generate the template confs and merge the added/ updated conf into it (only the latest one) and create/update the corresponding MC for pool - if delete, delete the whole MC for pool This means it will only allow 1 kubeletconfig per pool. This also means that updates to an existing kubeletconfig that modifies a pool will break the logic, and is an upgrade blocker. See original BZ above. We should instead aim to make kubeletconfigs re-sync the overall state every event, much like we do for MachineConfigs. This means that all kubeletconfigs belonging to a pool gets synced and the MachineConfig is updated. Version-Release number of selected component (if applicable): 4.5 How reproducible: Always Steps to Reproduce: Example: 1. apply a kubeletconfig targeting workers (kc1) 2. apply a new kubeletconfig targeting workers (kc2) 3. check configs. kc1 is silently deleted
Workarounds/fixes are possible (see other bug https://bugzilla.redhat.com/show_bug.cgi?id=1829116) and this is not a regression, so we will be pushing to 4.6 as it is quite a big behaviour change.
*** Bug 1829116 has been marked as a duplicate of this bug. ***
not fix on version : 4.7.0-0.nightly-2021-02-02-223803 I created 2 kubelet config crs, cr-1 and cr-2, but cr-2 didn't generate matching mc. And after creating cr-2, the mcp didn't roll out, for there was no new matching mc. And when I deleted cr-2, the kubelet config roll back to default values, not cr-1 values. cr-1: kind: KubeletConfig metadata: creationTimestamp: "2021-02-04T07:44:07Z" finalizers: - 99-worker-generated-kubelet generation: 1 managedFields: ... spec: kubeletConfig: maxPods: 221 machineConfigPoolSelector: matchLabels: custom-kubelet-max: max-pods cr-2: kind: KubeletConfig metadata: creationTimestamp: "2021-02-04T07:59:20Z" finalizers: - 99-worker-generated-kubelet // this line should be like 99-worker-generated-kubelet-1 generation: 1 ... spec: kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s machineConfigPoolSelector: matchLabels: custom-kubelet-cpu: set-CPUManager-worker Btw, I also found that in cr-1, I only set item maxPods: 221, yet the kubeletconfig file in node indeed add other kubelet items, such as imageMinimumGCAge: 0s default kubeletconfig: sh-4.4# cat /etc/kubernetes/kubelet.conf kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 authentication: x509: clientCAFile: /etc/kubernetes/kubelet-ca.crt anonymous: enabled: false cgroupDriver: systemd cgroupRoot: / clusterDNS: - 172.30.0.10 clusterDomain: cluster.local containerLogMaxSize: 50Mi maxPods: 250 kubeAPIQPS: 50 kubeAPIBurst: 100 rotateCertificates: true serializeImagePulls: false staticPodPath: /etc/kubernetes/manifests systemCgroups: /system.slice systemReserved: cpu: 500m memory: 1Gi ephemeral-storage: 1Gi featureGates: APIPriorityAndFairness: true LegacyNodeRoleBehavior: false # Will be removed in future openshift/api update https://github.com/openshift/api/commit/c8c8f6d0f4a8ac4ff4ad7d1a84b27e1aa7ebf9b4 RemoveSelfLink: false NodeDisruptionExclusion: true RotateKubeletServerCertificate: true SCTPSupport: true ServiceNodeExclusion: true SupportPodPidsLimit: true serverTLSBootstrap: true cr-1 kubeletconfig: sh-4.4# cat /etc/kubernetes/kubelet.conf { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "staticPodPath": "/etc/kubernetes/manifests", "syncFrequency": "0s", "fileCheckFrequency": "0s", "httpCheckFrequency": "0s", "rotateCertificates": true, "serverTLSBootstrap": true, "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/kubelet-ca.crt" }, "webhook": { "cacheTTL": "0s" }, "anonymous": { "enabled": false } }, "authorization": { "webhook": { "cacheAuthorizedTTL": "0s", "cacheUnauthorizedTTL": "0s" } }, "clusterDomain": "cluster.local", "clusterDNS": [ "172.30.0.10" ], // the following 5 items are added by kubelet automatically, not cr-1 "streamingConnectionIdleTimeout": "0s", "nodeStatusUpdateFrequency": "0s", "nodeStatusReportFrequency": "0s", "imageMinimumGCAge": "0s", "volumeStatsAggPeriod": "0s", "systemCgroups": "/system.slice", "cgroupRoot": "/", "cgroupDriver": "systemd", "cpuManagerReconcilePeriod": "0s", "runtimeRequestTimeout": "0s", "maxPods": 221, "kubeAPIQPS": 50, "kubeAPIBurst": 100, "serializeImagePulls": false, "evictionPressureTransitionPeriod": "0s", "featureGates": { "APIPriorityAndFairness": true, "LegacyNodeRoleBehavior": false, "NodeDisruptionExclusion": true, "RemoveSelfLink": false, "RotateKubeletServerCertificate": true, "SCTPSupport": true, "ServiceNodeExclusion": true, "SupportPodPidsLimit": true }, "containerLogMaxSize": "50Mi", "systemReserved": { "cpu": "500m", "ephemeral-storage": "1Gi", "memory": "1Gi" }, "logging": {}, "shutdownGracePeriod": "0s", "shutdownGracePeriodCriticalPods": "0s" }
tested on version 4.8.0-0.nightly-2021-02-21-102854, this time cr-1 and cr-2 both generated matching mc, but when cr-2's configuration take effect, it delete the configuration in cr-1. For example, in cr-1, I set maxPods:221, but when cr-2 take effect, it returns to default value, maxPods:250. $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 101m 00-worker f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 101m 01-master-container-runtime f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 01-master-kubelet f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 01-worker-container-runtime f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 01-worker-kubelet f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 99-master-generated-registries f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 99-master-ssh 3.2.0 108m 99-worker-generated-kubelet f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 13m 99-worker-generated-kubelet-1 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 51s 99-worker-generated-registries f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 99-worker-ssh 3.2.0 108m rendered-master-6b3451a109d2f390a0b077f0643347bf f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m rendered-master-904e26c7b99f0b8ef4c0e457aed11756 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 82m rendered-worker-1149f4a834ab2869ced2bdd1ef6af042 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 82m rendered-worker-7e52c22196cf66e34b2466f2865387f3 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m rendered-worker-bc8148ef5ae48c56c4767aa866a3194e f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 13m rendered-worker-e5f0ab15eca6725000ec3edb2d3f0500 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 46s cr-1: spec: kubeletConfig: maxPods: 221 machineConfigPoolSelector: matchLabels: custom-kubelet-max: max-pods cr-2: spec: kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s machineConfigPoolSelector: matchLabels: custom-kubelet-cpu: set-CPUManager-worker sh-4.4# chroot /host sh-4.4# cat /etc/kubernetes/kubelet.conf { "kind": "KubeletConfiguration", ... "cpuManagerPolicy": "static", "cpuManagerReconcilePeriod": "5s", "runtimeRequestTimeout": "0s", "maxPods": 250, ...
Hi Min, That is the expected functionality. Only the changes from cr-2 will be applied and any cr-1 changes will be discarded. If the user wants to maintain the cr-1 changes, then they should specify maxPods again in cr-2. We pick the most recent kubelet config changes, we don't combine all the existing ones. Moving back to on_qa.
(In reply to Urvashi Mohnani from comment #12) > Hi Min, > > That is the expected functionality. Only the changes from cr-2 will be > applied and any cr-1 changes will be discarded. If the user wants to > maintain the cr-1 changes, then they should specify maxPods again in cr-2. > We pick the most recent kubelet config changes, we don't combine all the > existing ones. Moving back to on_qa. Hi, Urvashi Mohnani via pr https://github.com/openshift/machine-config-operator/pull/2366, I found it's another thing: - How to verify it Start a cluster and create multiple kubelet config CRs, then delete them and see that the changes on the node rolls back to the previous kubelet config CR and not directly to the defaults. Also, each kubelet config CR will have its own MC and the suffix of the MC name will be stored as annotation in the kubelet config CR.
Hi, Urvashi Mohnani Perhaps I misunderstand your meaning, what you said in Comment 12 doesn't contradict the content in pr 2366. Please ignore Comment 13. But when I created 3 kubeletconfig cr-1, cr-2, cr-3, cr-3 didn't generate any new mc. Indeed cr-3 matched the mc which cr-2 generated. For example: kubeletconfig mc cr-1 99-worker-generated-kubelet cr-2 99-worker-generated-kubelet-1 cr-3 99-worker-generated-kubelet-1 When I wait some time(about 20 minutes) after the creation of cr-3, its mc (99-worker-generated-kubelet-1) render a new mc which mcp worker can roll out to. Yet in normal case, it should render the new mc at once. So I think there is something wrong with MCO, and I will attch mcc log.
Created attachment 1759041 [details] mcc log
verified on version : 4.8.0-0.nightly-2021-03-16-221720 I created 3 kubeletconfig CR, and when delete a kubeletconfig CR, the changes on the node rolls back to the previous kubelet config CR. $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 00-worker 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-master-container-runtime 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-master-kubelet 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-worker-container-runtime 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-worker-kubelet 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 99-master-generated-registries 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 99-master-ssh 3.2.0 5h47m 99-worker-generated-kubelet 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 44m 99-worker-generated-kubelet-1 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 19m 99-worker-generated-kubelet-2 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 48s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438