Description of problem: The MCC pos goes continuously restarts Version-Release number of selected component (if applicable): How reproducible: Customer specific, may be the "spec" section of kubelet config is missing Steps to Reproduce: NA Actual results: The MCC pods shows many restarts NAME READY STATUS RESTARTS AGE IP NODE machine-config-controller-xxxx-xxx 0/1 Running 1521 6 5d 172.x.x.x example-master-0 Expected results: The MCC pod should come up Additional info: A similar bug exist https://bugzilla.redhat.com/show_bug.cgi?id=1886636
Could you please provide more information? At the very least: 1. version/environments/known customizations of the cluster 2. must-gather of the cluster 3. MCC pods status or logs 4. MCO clusteroperator status
Hi, Harshal Patil I understand this case panic because the kubeletconfig lack "spec.kubeletConfig" part. Yet the validation of [1] is for the field of kubeletconfig.spec.kubeletConfig, and it just fix since 4.7. we need to backport to 4.6 if needed. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-07-07-181104 True False 22h Cluster version is 4.6.0-0.nightly-2021-07-07-181104 $ oc explain kubeletconfig.spec.kubeletConfig KIND: KubeletConfig VERSION: machineconfiguration.openshift.io/v1 DESCRIPTION: <empty> [1] https://github.com/openshift/machine-config-operator/issues/2357
there are duplicate descriptions: [root@qe-preserve-minmlimerrn-1 ~]# oc explain kubeletconfig.spec.kubeletConfig --recursive=true KIND: KubeletConfig VERSION: machineconfiguration.openshift.io/v1 DESCRIPTION: The fields of the kubelet configuration are defined in kubernetes upstream. Please refer to the types defined in the version/commit used by OpenShift of the upstream kubernetes. It's important to note that, since the fields of the kubelet configuration are directly fetched from upstream the validation of those values is handled directly by the kubelet. Please refer to the upstream version of the relevant kubernetes for the valid values of these fields. Invalid values of the kubelet configuration fields may render cluster nodes unusable. The fields of the kubelet configuration are defined in kubernetes upstream. Please refer to the types defined in the version/commit used by OpenShift of the upstream kubernetes. It's important to note that, since the fields of the kubelet configuration are directly fetched from upstream the validation of those values is handled directly by the kubelet. Please refer to the upstream version of the relevant kubernetes for the valid values of these fields. Invalid values of the kubelet configuration fields may render cluster nodes unusable. [root@qe-preserve-minmlimerrn-1 ~]# [root@qe-preserve-minmlimerrn-1 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-07-15-035804 True False 35m Cluster version is 4.6.0-0.nightly-2021-07-15-035804
and @Qi Wang, Do you know how can I verify the doc fix? Can you provide a doc url? I can't reach it by pull request "docs/KubeletConfigDesign.md"
Hi, @minmli, which doc fix, the openshift doc like https://docs.openshift.com/container-platform/4.7/welcome/index.html? Did the original BZ fix with the doc?
(In reply to Qi Wang from comment #11) > Hi, @minmli, which doc fix, the openshift doc like > https://docs.openshift.com/container-platform/4.7/welcome/index.html? Did > the original BZ fix with the doc? I got it, the doc fix is not in the official OpenShift doc, but in MCO doc: https://github.com/harche/machine-config-operator/blob/8136a89ada3e3cd86c4140398a057384e3fde364/docs/KubeletConfigDesign.md
https://github.com/openshift/machine-config-operator/pull/2699 might be a fix for this BZ, it also keeps the same implementation as the current upstream, but I didn't find a way to replicate this BZ. I haven't figured out why the description is duplicate as Comment9 yet.
reproduce the issue on version: 4.6.0-0.nightly-2021-08-16-005317 cat custom-kubelet-fail.yaml: apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: annotations: {} name: custom-kubelet-test spec: machineConfigPoolSelector: maxPods: 244 imageMinimumGCAge: 1m imageGCHighThresholdPercent: 40 imageGCLowThresholdPercent: 30 matchLabels: custom-kubelet: test-pods $ oc label mcp worker custom-kubelet=test-pods $ oc create -f custom-kubelet-fail.yaml wait several minutes,check mcc log: $ oc get pod -n openshift-machine-config-operator NAME READY STATUS RESTARTS AGE machine-config-controller-7698f5c54-5nc5v 0/1 CrashLoopBackOff 2 153m machine-config-daemon-9f85w 2/2 Running 0 153m machine-config-daemon-9t6pv 2/2 Running 0 151m $ oc logs -f machine-config-controller-7698f5c54-5nc5v -n openshift-machine-config-operator I0816 10:06:59.966353 1 start.go:50] Version: v4.6.0-202108140028.p0.git.c55adc4-dirty (c55adc48b86a0d0d70d6fcbbcdeaa8094734817a) I0816 10:06:59.968185 1 leaderelection.go:243] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller... I0816 10:08:55.625916 1 leaderelection.go:253] successfully acquired lease openshift-machine-config-operator/machine-config-controller I0816 10:08:55.735799 1 node_controller.go:152] Starting MachineConfigController-NodeController I0816 10:08:55.736731 1 container_runtime_config_controller.go:189] Starting MachineConfigController-ContainerRuntimeConfigController I0816 10:08:55.737073 1 render_controller.go:124] Starting MachineConfigController-RenderController I0816 10:08:55.741443 1 kubelet_config_controller.go:161] Starting MachineConfigController-KubeletConfigController I0816 10:08:55.741819 1 template_controller.go:183] Starting MachineConfigController-TemplateController E0816 10:08:55.818038 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 258 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1818500, 0x2702b00) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa6 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89 panic(0x1818500, 0x2702b00) /usr/lib/golang/src/runtime/panic.go:969 +0x1b9 github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).syncKubeletConfig(0xc000108270, 0xc0004fef20, 0x13, 0x0, 0x0) /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:465 +0xc6b github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).processNextWorkItem(0xc000108270, 0x203000) /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:278 +0xcf github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).worker(0xc000108270) /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:267 +0x2b k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007a4040) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007a4040, 0x1c32480, 0xc00061e870, 0x1, 0xc0000ca600) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007a4040, 0x3b9aca00, 0x0, 0x1, 0xc0000ca600) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc0007a4040, 0x3b9aca00, 0xc0000ca600) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d created by github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).Run /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:165 +0x23e panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x16833eb] goroutine 258 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c panic(0x1818500, 0x2702b00) /usr/lib/golang/src/runtime/panic.go:969 +0x1b9 github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).syncKubeletConfig(0xc000108270, 0xc0004fef20, 0x13, 0x0, 0x0) /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:465 +0xc6b github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).processNextWorkItem(0xc000108270, 0x203000) /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:278 +0xcf github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).worker(0xc000108270) /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:267 +0x2b k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007a4040) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007a4040, 0x1c32480, 0xc00061e870, 0x1, 0xc0000ca600) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007a4040, 0x3b9aca00, 0x0, 0x1, 0xc0000ca600) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc0007a4040, 0x3b9aca00, 0xc0000ca600) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d created by github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).Run /go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:165 +0x23e
test pass on cluster lanuched by cluster-bot: launch openshift/machine-config-operator#2719 aws there is no panic!
the bug doesn't move to verified automatically by process of verification before pr merge, so set it verified directly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3395
*** Bug 2069764 has been marked as a duplicate of this bug. ***