Bug 1788061
| Summary: | KubeletConfig controller should respect all feature gates resources | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Artyom <alukiano> |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> |
| Status: | CLOSED NOTABUG | QA Contact: | Sunil Choudhary <schoudha> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.4 | CC: | aos-bugs, augol, jokerman, rphillips, scuppett, william.caban |
| Target Milestone: | --- | ||
| Target Release: | 4.4.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-02-27 15:29:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1771572 | ||
The feature gate CR name should be 'cluster'. Please retry with this name. It exacts the reason why I opened the bug, I can see that https://github.com/openshift/machine-config-operator/blob/04cd2198cae247fabcd3154669618d74f124f27f/pkg/controller/kubelet-config/kubelet_config_features.go#L47 respects all feature gate resources, why the kubelet controller can not do the same? I was expecting that the controller will check all feature gate resources and merge it under the rendered machine config. Also an additional question, under machine-config-pool I can see a number of kubelet resources
Source:
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 01-worker-kubelet
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 98-performance-ci-820350b6-4c52-41ec-9d9a-0a88b34e97ef-kubelet
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 98-worker-247dbe02-ec39-4ea9-b514-e258aecf0729-kubelet
API Version: machineconfiguration.openshift.io/v1
Kind: MachineConfig
Name: 99-performance-ci-820350b6-4c52-41ec-9d9a-0a88b34e97ef-kubelet
I expected that machine-config-pool will render machine-config that will include the merge of all these configs, but it does not look so.
From what I can see https://github.com/openshift/machine-config-operator/blob/04cd2198cae247fabcd3154669618d74f124f27f/pkg/apis/machineconfiguration.openshift.io/v1/helpers.go#L17 merge method does not really do merge, it just appends ignition configs, so in our case, the last ignition config with the /etc/kubernetes/kubelet.conf will override the whole configuration.
So also for the `cluster` feature gate, if we will first create the KubeletConfig resource and after update cluster feature gate resource, we will have the same problem, because KubeletConfig will be generated without TopologyManager feature gate, and we do not enter to kubelet config sync method on the feature gate resource change.
The config objects are singletons and so the 'cluster' KubeletConfig is the only supported config. Closing since configs are singletons. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Description of problem: Under my cluster, I have two feature gates resources # oc describe featuregate Name: cluster Namespace: Labels: <none> Annotations: release.openshift.io/create-only: true API Version: config.openshift.io/v1 Kind: FeatureGate Metadata: Creation Timestamp: 2020-01-05T15:00:37Z Generation: 1 Resource Version: 1608 Self Link: /apis/config.openshift.io/v1/featuregates/cluster UID: f8b5d7e0-30ea-4aa4-ba28-037477430806 Spec: Events: <none> Name: latency-sensetive Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: FeatureGate Metadata: Creation Timestamp: 2020-01-06T08:42:29Z Generation: 1 Owner References: API Version: performance.openshift.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: PerformanceProfile Name: ci UID: 91e3712a-0018-4c46-b4bc-60214443c62d Resource Version: 334393 Self Link: /apis/config.openshift.io/v1/featuregates/latency-sensetive UID: c3046163-4d2b-4d68-a1c0-5ae05d6615d4 Spec: Feature Set: LatencySensitive Events: <none> and one kubelet config # oc describe kubeletconfig Name: performance-ci Namespace: Labels: <none> Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: KubeletConfig Metadata: Creation Timestamp: 2020-01-06T08:42:29Z Finalizers: 99-performance-ci-819adb4c-a908-45b9-a6f9-bbfd651320a9-kubelet Generation: 1 Owner References: API Version: performance.openshift.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: PerformanceProfile Name: ci UID: 91e3712a-0018-4c46-b4bc-60214443c62d Resource Version: 338475 Self Link: /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/performance-ci UID: 2ddb3962-db6f-482a-9a1b-4bd51dadc714 Spec: Kubelet Config: Authentication: Anonymous: Webhook: Cache TTL: 0s x509: Authorization: Webhook: Cache Authorized TTL: 0s Cache Unauthorized TTL: 0s Cpu Manager Policy: static Cpu Manager Reconcile Period: 5s Eviction Pressure Transition Period: 0s File Check Frequency: 0s Http Check Frequency: 0s Image Minimum GC Age: 0s Kube Reserved: Cpu: 1000m Memory: 500Mi Node Status Report Frequency: 0s Node Status Update Frequency: 0s Reserved System CP Us: 0-1 Runtime Request Timeout: 0s Streaming Connection Idle Timeout: 0s Sync Frequency: 0s System Reserved: Cpu: 1000m Memory: 500Mi Topology Manager Policy: best-effort Volume Stats Agg Period: 0s Machine Config Pool Selector: Match Labels: machineconfigpool.openshift.io/role: performance-ci Status: Conditions: Last Transition Time: 2020-01-06T08:42:30Z Message: Success Status: True Type: Success Last Transition Time: 2020-01-06T08:48:35Z Message: Success Status: True Type: Success Events: <none> And I have a situation when the kubelet config was created after the new feature gate resource, so the kubelet_config_controller rendered the machine config with kubelet topology manager parameters, but without appropriate TopologyManager feature gate. # oc describe mcp performance-ci ... Configuration: Name: rendered-performance-ci-0779df28fd2847676ff858646dd7bf03 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker-chronyd-custom API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 98-performance-ci-819adb4c-a908-45b9-a6f9-bbfd651320a9-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 98-worker-6a886427-d155-4c29-9eef-f9f9c56274e2-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-performance-ci-819adb4c-a908-45b9-a6f9-bbfd651320a9-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-6a886427-d155-4c29-9eef-f9f9c56274e2-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: performance-ci ... # oc describe mc 98-worker-6a886427-d155-4c29-9eef-f9f9c56274e2-kubelet ... Source: data:text/plain,...SupportPodPidsLimit%22%3Atrue%2C%22TopologyManager%22%3Atrue%7D%2C%22containerLogMaxSize ... # oc describe mc 99-performance-ci-819adb4c-a908-45b9-a6f9-bbfd651320a9-kubelet ... data:text/plain,...SupportPodPidsLimit%22%3Atrue%7D%2C%22containerLogMaxSize ... Version-Release number of selected component (if applicable): # oc version Client Version: 4.4.0-0.ci-2020-01-05-092524 Server Version: 4.4.0-0.ci-2020-01-05-092524 Kubernetes Version: v1.17.0 How reproducible: Always Steps to Reproduce: 1. Create additional topology manager apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: name: latency-sensetive spec: featureSet: LatencySensitive 2. Wait for the update of master and worker mcp's 3. label worker mcp with "machineconfigpool.openshift.io/role: worker" 4. Create KubeletConfig apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: performance spec: kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s topologyManagerPolicy: best-effort machineConfigPoolSelector: matchLabels: machineconfigpool.openshift.io/role: worker 5. Wait for the update of worker mcp Actual results: MCP never have the updated condition, because node fails to start kubelet 06 09:47:36 worker-0 hyperkube[4972]: F0106 09:47:36.520874 4972 server.go:215] invalid configuration: TopologyManager best-effort requires feature gate TopologyManager Expected results: Worker MCP should achieve updated status and all worker nodes should be up. Additional info: W/A did not check it, but probably once I will re-create feature gate, it will initiate update of the configuration under the MCP. The problematic part of the code - https://github.com/openshift/machine-config-operator/blob/04cd2198cae247fabcd3154669618d74f124f27f/pkg/controller/kubelet-config/kubelet_config_controller.go#L409