Bug 2041814
Summary: | The KubeletConfigController wrongly process multiple confs for a pool | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | MinLi <minmli> | |
Component: | Node | Assignee: | Qi Wang <qiwan> | |
Node sub component: | Kubelet | QA Contact: | MinLi <minmli> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | medium | CC: | aos-bugs, dshumake, harpatil, nagrawal | |
Version: | 4.7 | |||
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2076355 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 10:42:31 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2074225 |
Description
MinLi
2022-01-18 10:34:57 UTC
$ oc logs -f machine-config-controller-6d5cf7dbc9-w4lng -n openshift-machine-config-operator I0118 08:43:23.870051 1 start.go:50] Version: v4.7.0-202201082234.p0.g51dc080.assembly.stream-dirty (51dc0801ed7d705820f557fcabf04eff023bf568) I0118 08:43:23.875080 1 leaderelection.go:243] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller... I0118 08:45:19.709842 1 leaderelection.go:253] successfully acquired lease openshift-machine-config-operator/machine-config-controller E0118 08:45:19.769731 1 template_controller.go:121] couldn't get ControllerConfig on secret callback &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"controllerconfig.machineconfiguration.openshift.io \"machine-config-controller\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc000a6d200), Code:404}} I0118 08:45:19.937364 1 container_runtime_config_controller.go:185] Starting MachineConfigController-ContainerRuntimeConfigController I0118 08:45:19.937333 1 kubelet_config_controller.go:156] Starting MachineConfigController-KubeletConfigController I0118 08:45:20.140439 1 node_controller.go:152] Starting MachineConfigController-NodeController I0118 08:45:20.140494 1 template_controller.go:183] Starting MachineConfigController-TemplateController I0118 08:45:20.140864 1 render_controller.go:124] Starting MachineConfigController-RenderController I0118 08:45:20.349436 1 kubelet_config_controller.go:575] Applied KubeletConfig max-pod-1 on MachineConfigPool master I0118 08:45:20.946864 1 kubelet_config_controller.go:575] Applied KubeletConfig max-pod on MachineConfigPool master I0118 08:47:26.484121 1 node_controller.go:419] Pool master: node minmli01174701-llg89-master-2.c.openshift-qe.internal: Reporting unready: node minmli01174701-llg89-master-2.c.openshift-qe.internal is reporting Unschedulable I0118 08:47:40.706433 1 node_controller.go:419] Pool master: node minmli01174701-llg89-master-2.c.openshift-qe.internal: Completed update to rendered-master-5743264e708b5842f9919abc9a534dd1 I0118 08:47:40.734308 1 node_controller.go:419] Pool master: node minmli01174701-llg89-master-2.c.openshift-qe.internal: Reporting ready I0118 08:47:45.706891 1 status.go:90] Pool master: All nodes are updated with rendered-master-5743264e708b5842f9919abc9a534dd1 This issue also hit 4.8, yet it didn't reproduce when apply to machine-config-pool worker. It seems only happen on mcp master. Did some tests about this. I can reproduce it also on worker nodes. Use the same kubeletconfig yaml files in the #Description: [qiwan@qiwan ~]$ oc get kubeletconfig NAME AGE max-pod-1 18m max-pod-2 13m [qiwan@qiwan ~]$ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 00-worker c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 01-master-container-runtime c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 01-master-kubelet c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 01-worker-container-runtime c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 01-worker-kubelet c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 99-master-generated-registries c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 99-master-ssh 3.2.0 142m 99-worker-generated-kubelet c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 11m 99-worker-generated-kubelet-1 c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 6m30s 99-worker-generated-kubelet-2 c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 36s 99-worker-generated-registries c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 137m 99-worker-ssh 3.2.0 142m rendered-master-054de7b98995263c96d5dd6a2e6dd69d c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 11m rendered-master-c4800fc1f9d8605a0a09c2223c4134d5 3e9f2ca58e00d5dd5a54b18fb5b00c5571b5c8e3 3.2.0 137m rendered-master-c8f113baf5e0edbfc712c71419ae618b 3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0 3.2.0 108m rendered-worker-17a5327f1131efbefd6e37e7fcf77f0a 3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0 3.2.0 17m rendered-worker-2b1cea99aa958104038153ce242bcd0c 3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0 3.2.0 108m rendered-worker-3cb30ae1c943187b362941403077c309 3e9f2ca58e00d5dd5a54b18fb5b00c5571b5c8e3 3.2.0 117m rendered-worker-6058784b6b5e4476f0622834848f3832 3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0 3.2.0 104m rendered-worker-7dd4db23a125800347de7b5bb11bdad0 c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 11m rendered-worker-d5186c9ec9766128a859c5f2a9fa48e0 c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 6m25s rendered-worker-d98df84416d9db4734259ee802f2a369 c2a8dc8e8731107f70279cfa720c13b499fdca15 3.2.0 10m rendered-worker-e9bb9fc65a7fb7f320a9eecc7ae340f1 3dc7c5ad8cd2a46c6cf1d6d68558e83f8fb8f3b0 3.2.0 101m rendered-worker-f53880ebafa7b07cfd8f0543e65c8419 3e9f2ca58e00d5dd5a54b18fb5b00c5571b5c8e3 3.2.0 137m [qiwan@qiwan ~]$ oc describe kubeletconfig/max-pod-1 Name: max-pod-1 Namespace: Labels: <none> Annotations: machineconfiguration.openshift.io/mc-name-suffix: 2 API Version: machineconfiguration.openshift.io/v1 Kind: KubeletConfig Metadata: Creation Timestamp: 2022-01-28T21:04:42Z Finalizers: 99-worker-generated-kubelet 99-worker-generated-kubelet-2 Generation: 1 Managed Fields: API Version: machineconfiguration.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: f:spec: .: f:kubeletConfig: .: f:maxPods: f:machineConfigPoolSelector: .: f:matchLabels: .: f:custom-kubelet-worker: Manager: kubectl-client-side-apply Operation: Update Time: 2022-01-28T21:04:42Z API Version: machineconfiguration.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: Manager: machine-config-controller Operation: Update Subresource: status Time: 2022-01-28T21:04:43Z API Version: machineconfiguration.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: f:machineconfiguration.openshift.io/mc-name-suffix: f:finalizers: .: v:"99-worker-generated-kubelet": v:"99-worker-generated-kubelet-2": Manager: machine-config-controller Operation: Update Time: 2022-01-28T21:15:12Z Resource Version: 87484 UID: 80977f7c-3238-4162-b98f-a9c3b415b94c // After max-pod-2 has been rolled out, wait for several minutes and checked the log to see the kubeletconfig resync [qiwan@qiwan ~]$ oc logs -f machine-config-controller-5f9bd97d4f-c7xft I0128 21:12:30.444264 1 start.go:50] Version: machine-config-daemon-4.6.0-202006240615.p0-1231-gc2a8dc8e (c2a8dc8e8731107f70279cfa720c13b499fdca15) I0128 21:12:33.561471 1 leaderelection.go:248] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller... I0128 21:15:11.364531 1 leaderelection.go:258] successfully acquired lease openshift-machine-config-operator/machine-config-controller I0128 21:15:11.402650 1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change I0128 21:15:11.469686 1 node_controller.go:152] Starting MachineConfigController-NodeController I0128 21:15:11.472877 1 kubelet_config_controller.go:169] Starting MachineConfigController-KubeletConfigController I0128 21:15:11.474188 1 container_runtime_config_controller.go:184] Starting MachineConfigController-ContainerRuntimeConfigController I0128 21:15:11.474369 1 kubelet_config_controller.go:446] sync kubbeletconfig: key: max-pod-1time; 2022-01-28 21:15:11.474363961 +0000 UTC m=+161.155989663 I0128 21:15:11.474475 1 kubelet_config_controller.go:471] kubeleltc config name: max-pod-1 I0128 21:15:11.478355 1 kubelet_config_controller.go:446] sync kubbeletconfig: key: max-pod-2time; 2022-01-28 21:15:11.478348348 +0000 UTC m=+161.159974064 I0128 21:15:11.478446 1 kubelet_config_controller.go:471] kubeleltc config name: max-pod-2 I0128 21:15:11.575671 1 template_controller.go:238] Starting MachineConfigController-TemplateController I0128 21:15:11.575951 1 render_controller.go:124] Starting MachineConfigController-RenderController I0128 21:15:11.673755 1 kubelet_config_controller.go:632] Applied KubeletConfig max-pod-2 on MachineConfigPool worker I0128 21:15:12.485615 1 kubelet_config_controller.go:632] Applied KubeletConfig max-pod-1 on MachineConfigPool worker I0128 21:15:16.419086 1 node_controller.go:414] Pool worker: 2 candidate nodes for update, capacity: 1 I0128 21:15:16.419175 1 node_controller.go:414] Pool worker: Setting node ci-ln-l5nkrxk-72292-79kr7-worker-c-m7xhj target to rendered-worker-d5186c9ec9766128a859c5f2a9fa48e0 I0128 21:15:16.453658 1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"", Name:"worker", UID:"3a9d3790-cbd8-401d-9443-69f97dc0e619", APIVersion:"machineconfiguration.openshift.io/v1", ResourceVersion:"83282", FieldPath:""}): type: 'Normal' reason: 'SetDesiredConfig' Targeted node ci-ln-l5nkrxk-72292-79kr7-worker-c-m7xhj to config rendered-worker-d5186c9ec9766128a859c5f2a9fa48e0 verified! $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-02-27-122819 True False 146m Cluster version is 4.11.0-0.nightly-2022-02-27-122819 $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 00-worker 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 01-master-container-runtime 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 01-master-kubelet 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 01-worker-container-runtime 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 01-worker-kubelet 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 99-master-generated-kubelet 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 44m 99-master-generated-kubelet-1 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 25m 99-master-generated-registries 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 99-master-ssh 3.2.0 112m 99-worker-generated-registries 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m 99-worker-ssh 3.2.0 112m rendered-master-1d1f4e45e5ea50e22a8b7d729af00f03 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 25m rendered-master-6c05924f5ef7961a3857db00dee9a1fe 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 44m rendered-master-8f3d37cfd041071f334a29fe070778ff 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m rendered-worker-60525c576deef509b0f0644c942e989e 4e7fe38d0db4a3a542a246b6a9eb97f582b91b07 3.2.0 110m *** Bug 2069764 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |