Bug 1840881
Summary: | The KubeletConfigController cannot process multiple confs for a pool/ pool changes | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yu Qi Zhang <jerzhang> | ||||
Component: | Node | Assignee: | Urvashi Mohnani <umohnani> | ||||
Node sub component: | Kubelet | QA Contact: | MinLi <minmli> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | amurdaca, aos-bugs, jokerman, jrouth, minmli, nagrawal, tsweeney, umohnani | ||||
Version: | 4.5 | ||||||
Target Milestone: | --- | ||||||
Target Release: | 4.8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Known Issue | |||||
Doc Text: |
This was already documented for 4.7 I believe. Most likely nothing needs to be done here.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-07-27 22:32:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1929257 | ||||||
Attachments: |
|
Description
Yu Qi Zhang
2020-05-27 19:12:28 UTC
Workarounds/fixes are possible (see other bug https://bugzilla.redhat.com/show_bug.cgi?id=1829116) and this is not a regression, so we will be pushing to 4.6 as it is quite a big behaviour change. *** Bug 1829116 has been marked as a duplicate of this bug. *** not fix on version : 4.7.0-0.nightly-2021-02-02-223803 I created 2 kubelet config crs, cr-1 and cr-2, but cr-2 didn't generate matching mc. And after creating cr-2, the mcp didn't roll out, for there was no new matching mc. And when I deleted cr-2, the kubelet config roll back to default values, not cr-1 values. cr-1: kind: KubeletConfig metadata: creationTimestamp: "2021-02-04T07:44:07Z" finalizers: - 99-worker-generated-kubelet generation: 1 managedFields: ... spec: kubeletConfig: maxPods: 221 machineConfigPoolSelector: matchLabels: custom-kubelet-max: max-pods cr-2: kind: KubeletConfig metadata: creationTimestamp: "2021-02-04T07:59:20Z" finalizers: - 99-worker-generated-kubelet // this line should be like 99-worker-generated-kubelet-1 generation: 1 ... spec: kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s machineConfigPoolSelector: matchLabels: custom-kubelet-cpu: set-CPUManager-worker Btw, I also found that in cr-1, I only set item maxPods: 221, yet the kubeletconfig file in node indeed add other kubelet items, such as imageMinimumGCAge: 0s default kubeletconfig: sh-4.4# cat /etc/kubernetes/kubelet.conf kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 authentication: x509: clientCAFile: /etc/kubernetes/kubelet-ca.crt anonymous: enabled: false cgroupDriver: systemd cgroupRoot: / clusterDNS: - 172.30.0.10 clusterDomain: cluster.local containerLogMaxSize: 50Mi maxPods: 250 kubeAPIQPS: 50 kubeAPIBurst: 100 rotateCertificates: true serializeImagePulls: false staticPodPath: /etc/kubernetes/manifests systemCgroups: /system.slice systemReserved: cpu: 500m memory: 1Gi ephemeral-storage: 1Gi featureGates: APIPriorityAndFairness: true LegacyNodeRoleBehavior: false # Will be removed in future openshift/api update https://github.com/openshift/api/commit/c8c8f6d0f4a8ac4ff4ad7d1a84b27e1aa7ebf9b4 RemoveSelfLink: false NodeDisruptionExclusion: true RotateKubeletServerCertificate: true SCTPSupport: true ServiceNodeExclusion: true SupportPodPidsLimit: true serverTLSBootstrap: true cr-1 kubeletconfig: sh-4.4# cat /etc/kubernetes/kubelet.conf { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "staticPodPath": "/etc/kubernetes/manifests", "syncFrequency": "0s", "fileCheckFrequency": "0s", "httpCheckFrequency": "0s", "rotateCertificates": true, "serverTLSBootstrap": true, "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/kubelet-ca.crt" }, "webhook": { "cacheTTL": "0s" }, "anonymous": { "enabled": false } }, "authorization": { "webhook": { "cacheAuthorizedTTL": "0s", "cacheUnauthorizedTTL": "0s" } }, "clusterDomain": "cluster.local", "clusterDNS": [ "172.30.0.10" ], // the following 5 items are added by kubelet automatically, not cr-1 "streamingConnectionIdleTimeout": "0s", "nodeStatusUpdateFrequency": "0s", "nodeStatusReportFrequency": "0s", "imageMinimumGCAge": "0s", "volumeStatsAggPeriod": "0s", "systemCgroups": "/system.slice", "cgroupRoot": "/", "cgroupDriver": "systemd", "cpuManagerReconcilePeriod": "0s", "runtimeRequestTimeout": "0s", "maxPods": 221, "kubeAPIQPS": 50, "kubeAPIBurst": 100, "serializeImagePulls": false, "evictionPressureTransitionPeriod": "0s", "featureGates": { "APIPriorityAndFairness": true, "LegacyNodeRoleBehavior": false, "NodeDisruptionExclusion": true, "RemoveSelfLink": false, "RotateKubeletServerCertificate": true, "SCTPSupport": true, "ServiceNodeExclusion": true, "SupportPodPidsLimit": true }, "containerLogMaxSize": "50Mi", "systemReserved": { "cpu": "500m", "ephemeral-storage": "1Gi", "memory": "1Gi" }, "logging": {}, "shutdownGracePeriod": "0s", "shutdownGracePeriodCriticalPods": "0s" } tested on version 4.8.0-0.nightly-2021-02-21-102854, this time cr-1 and cr-2 both generated matching mc, but when cr-2's configuration take effect, it delete the configuration in cr-1. For example, in cr-1, I set maxPods:221, but when cr-2 take effect, it returns to default value, maxPods:250. $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 101m 00-worker f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 101m 01-master-container-runtime f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 01-master-kubelet f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 01-worker-container-runtime f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 01-worker-kubelet f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 99-master-generated-registries f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 99-master-ssh 3.2.0 108m 99-worker-generated-kubelet f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 13m 99-worker-generated-kubelet-1 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 51s 99-worker-generated-registries f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m 99-worker-ssh 3.2.0 108m rendered-master-6b3451a109d2f390a0b077f0643347bf f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m rendered-master-904e26c7b99f0b8ef4c0e457aed11756 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 82m rendered-worker-1149f4a834ab2869ced2bdd1ef6af042 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 82m rendered-worker-7e52c22196cf66e34b2466f2865387f3 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 100m rendered-worker-bc8148ef5ae48c56c4767aa866a3194e f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 13m rendered-worker-e5f0ab15eca6725000ec3edb2d3f0500 f25e3a80653c97a9152e2c77f43ddaf82edcd3ad 3.2.0 46s cr-1: spec: kubeletConfig: maxPods: 221 machineConfigPoolSelector: matchLabels: custom-kubelet-max: max-pods cr-2: spec: kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s machineConfigPoolSelector: matchLabels: custom-kubelet-cpu: set-CPUManager-worker sh-4.4# chroot /host sh-4.4# cat /etc/kubernetes/kubelet.conf { "kind": "KubeletConfiguration", ... "cpuManagerPolicy": "static", "cpuManagerReconcilePeriod": "5s", "runtimeRequestTimeout": "0s", "maxPods": 250, ... Hi Min, That is the expected functionality. Only the changes from cr-2 will be applied and any cr-1 changes will be discarded. If the user wants to maintain the cr-1 changes, then they should specify maxPods again in cr-2. We pick the most recent kubelet config changes, we don't combine all the existing ones. Moving back to on_qa. (In reply to Urvashi Mohnani from comment #12) > Hi Min, > > That is the expected functionality. Only the changes from cr-2 will be > applied and any cr-1 changes will be discarded. If the user wants to > maintain the cr-1 changes, then they should specify maxPods again in cr-2. > We pick the most recent kubelet config changes, we don't combine all the > existing ones. Moving back to on_qa. Hi, Urvashi Mohnani via pr https://github.com/openshift/machine-config-operator/pull/2366, I found it's another thing: - How to verify it Start a cluster and create multiple kubelet config CRs, then delete them and see that the changes on the node rolls back to the previous kubelet config CR and not directly to the defaults. Also, each kubelet config CR will have its own MC and the suffix of the MC name will be stored as annotation in the kubelet config CR. Hi, Urvashi Mohnani Perhaps I misunderstand your meaning, what you said in Comment 12 doesn't contradict the content in pr 2366. Please ignore Comment 13. But when I created 3 kubeletconfig cr-1, cr-2, cr-3, cr-3 didn't generate any new mc. Indeed cr-3 matched the mc which cr-2 generated. For example: kubeletconfig mc cr-1 99-worker-generated-kubelet cr-2 99-worker-generated-kubelet-1 cr-3 99-worker-generated-kubelet-1 When I wait some time(about 20 minutes) after the creation of cr-3, its mc (99-worker-generated-kubelet-1) render a new mc which mcp worker can roll out to. Yet in normal case, it should render the new mc at once. So I think there is something wrong with MCO, and I will attch mcc log. Created attachment 1759041 [details]
mcc log
verified on version : 4.8.0-0.nightly-2021-03-16-221720 I created 3 kubeletconfig CR, and when delete a kubeletconfig CR, the changes on the node rolls back to the previous kubelet config CR. $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 00-worker 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-master-container-runtime 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-master-kubelet 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-worker-container-runtime 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 01-worker-kubelet 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 99-master-generated-registries 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 5h39m 99-master-ssh 3.2.0 5h47m 99-worker-generated-kubelet 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 44m 99-worker-generated-kubelet-1 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 19m 99-worker-generated-kubelet-2 3f19bb8218c7319c9c0362a2e6071575c1bf8c84 3.2.0 48s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |