Description of problem: KubeletConfig CR cannot be applied to a custom Machine Config Pool. After issuing 'oc create -f <KubeletConifg>', the MCP does not trigger a rolling update for nodes. How reproducible: Always ENV: # oc version Client Version: 4.6.27 Server Version: 4.6.27 Kubernetes Version: v1.19.0+d856161 Steps to Reproduce: 1) Create a custom MCP, then add a worker node to the custom MCP. ~~~ $ oc get node NAME STATUS ROLES AGE VERSION ... worker-1.ocp4628.lab.pnq2.cee.redhat.com Ready custom,worker 2d20h v1.19.0+a5a0987 $ oc get mcp --show-labels NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE LABELS custom rendered-custom-f8e9b01721f86dd0b440dfe5eddf65ee True False False 1 1 1 0 13h custom-kubelet=large-pods master rendered-master-e7b7f9790445eed70c72e85248bdaa8e True False False 3 3 3 0 2d21h machineconfiguration.openshift.io/mco-built-in=,operator.machineconfiguration.openshift.io/required-for-upgrade=,pools.operator.machineconfiguration.openshift.io/master= worker rendered-worker-f8e9b01721f86dd0b440dfe5eddf65ee True False False 1 1 1 0 2d21h machineconfiguration.openshift.io/mco-built-in=,pools.operator.machineconfiguration.openshift.io/worker=,worker-kubelet=small-pods ~~~ 2) Apply a KubeletConfig CR to the worker node, then it will apply the change to nodes in worker MCP with rolling update. ~~~ $ cat small-pod.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: small-pods spec: machineConfigPoolSelector: matchLabels: worker-kubelet: small-pods kubeletConfig: systemReserved: cpu: 100m $ oc create -f small-pod.yaml <--- After this, the worker nodes in worker MCP did rolling update to apply the change. ~~~ 3) However, if create a KubeletConfig CR to apply a change to the custom MCP, the custom MCP dosen't trigger a rolling update. ~~~ $ cat custom-memory.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: custom-memory spec: machineConfigPoolSelector: matchLabels: custom-kubelet: large-pods kubeletConfig: systemReserved: memory: 2Gi $ oc create -f custom-momory.yaml <--- did not triiger a rolling update. ~~~ Actual results: The KubeletConfig CR for worker MCP can trigger a rolling update, however another KubeletConifg CR cannot override the configuration of a node that belongs to both worker and custom MCP, because it cannot trigger a rolling update after the worker MCP update. Expected results: custom MCP should trigger a rolling update after applying the kubeletconfig CR. Additional info: Question: 1. Is this behavior by designed? 2. Do we have a workaround to override/replace the old kubeletconfig corresponding to worker MCP with a new kubeletconfig CR corresponding to custom MCP?
I can reproduce the bug if the mcp was named `custom`. According to the discussion: https://coreos.slack.com/archives/C999USB0D/p1622650272328800?thread_ts=1622649259.327200&cid=C999USB0D, after the first kubeletconfig applied, the custom-momory.yaml did not roll out since machineconfig 99-worker-generated-kubelet has higher alphanumeric order than the 99-custom-generated-kubelet. I was able to apply custom-momory.yaml if the custom mcp named like zz-custom. In this way, the machineconfig 99-zz-custom-generated-kubelet has a higher alphanumeric order. The description in the bug meets the current design. I would like the mco team to take a look.
So the foundational issue here is that custom pools always inherit from worker pool machine configuration, meaning that if you want to have a kubeletconfig for the base worker pool and a different one for a custom pool, the custom pool machineconfig selector will make it have both kubeletconfig-generated-machineconfigs, and those are conflicting (in this case the alphanumeric ordering of machineconfigs apply). The MCO is working as intended in this case. I think if the custom pool kubeletconfigs needs to take priority, that might be a design decision in the kubeletconfigcontroller (and containerruntimecontroller). Today the "correct" way to do this is to not have any worker-pool kubeletconfigs, but rather have multiple custom pools that have different kubeletconfigs. As an example, if you have 3 nodes with 1 node that you want a custom kubeletconfig on, this would be: worker1 - roles: worker, custom1 worker2 - roles: worker, custom2 worker3 - roles: worker, custom2 and have custom1/custom2 have different kubeletconfigs targetting them.