Bug 1975626
| Summary: | Observed a panic: (runtime error: invalid memory address or nil pointer dereference)" in MCC | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Amit Kesarkar <akesarka> |
| Component: | Node | Assignee: | Qi Wang <qiwan> |
| Node sub component: | Kubelet | QA Contact: | MinLi <minmli> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | andrew.demarsh, aos-bugs, dshumake, harpatil, jerzhang, oarribas, qiwan, rkshirsa |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.6.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-09 01:52:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1903290 | ||
| Bug Blocks: | 2076645 | ||
|
Description
Amit Kesarkar
2021-06-24 05:30:44 UTC
Could you please provide more information? At the very least: 1. version/environments/known customizations of the cluster 2. must-gather of the cluster 3. MCC pods status or logs 4. MCO clusteroperator status Hi, Harshal Patil
I understand this case panic because the kubeletconfig lack "spec.kubeletConfig" part.
Yet the validation of [1] is for the field of kubeletconfig.spec.kubeletConfig, and it just fix since 4.7.
we need to backport to 4.6 if needed.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2021-07-07-181104 True False 22h Cluster version is 4.6.0-0.nightly-2021-07-07-181104
$ oc explain kubeletconfig.spec.kubeletConfig
KIND: KubeletConfig
VERSION: machineconfiguration.openshift.io/v1
DESCRIPTION:
<empty>
[1] https://github.com/openshift/machine-config-operator/issues/2357
there are duplicate descriptions:
[root@qe-preserve-minmlimerrn-1 ~]# oc explain kubeletconfig.spec.kubeletConfig --recursive=true
KIND: KubeletConfig
VERSION: machineconfiguration.openshift.io/v1
DESCRIPTION:
The fields of the kubelet configuration are defined in kubernetes upstream.
Please refer to the types defined in the version/commit used by OpenShift
of the upstream kubernetes. It's important to note that, since the fields
of the kubelet configuration are directly fetched from upstream the
validation of those values is handled directly by the kubelet. Please refer
to the upstream version of the relevant kubernetes for the valid values of
these fields. Invalid values of the kubelet configuration fields may render
cluster nodes unusable.
The fields of the kubelet configuration are defined in kubernetes upstream.
Please refer to the types defined in the version/commit used by OpenShift
of the upstream kubernetes. It's important to note that, since the fields
of the kubelet configuration are directly fetched from upstream the
validation of those values is handled directly by the kubelet. Please refer
to the upstream version of the relevant kubernetes for the valid values of
these fields. Invalid values of the kubelet configuration fields may render
cluster nodes unusable.
[root@qe-preserve-minmlimerrn-1 ~]#
[root@qe-preserve-minmlimerrn-1 ~]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2021-07-15-035804 True False 35m Cluster version is 4.6.0-0.nightly-2021-07-15-035804
and @Qi Wang, Do you know how can I verify the doc fix? Can you provide a doc url? I can't reach it by pull request "docs/KubeletConfigDesign.md" Hi, @minmli, which doc fix, the openshift doc like https://docs.openshift.com/container-platform/4.7/welcome/index.html? Did the original BZ fix with the doc? (In reply to Qi Wang from comment #11) > Hi, @minmli, which doc fix, the openshift doc like > https://docs.openshift.com/container-platform/4.7/welcome/index.html? Did > the original BZ fix with the doc? I got it, the doc fix is not in the official OpenShift doc, but in MCO doc: https://github.com/harche/machine-config-operator/blob/8136a89ada3e3cd86c4140398a057384e3fde364/docs/KubeletConfigDesign.md https://github.com/openshift/machine-config-operator/pull/2699 might be a fix for this BZ, it also keeps the same implementation as the current upstream, but I didn't find a way to replicate this BZ. I haven't figured out why the description is duplicate as Comment9 yet. reproduce the issue on version: 4.6.0-0.nightly-2021-08-16-005317
cat custom-kubelet-fail.yaml:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
annotations: {}
name: custom-kubelet-test
spec:
machineConfigPoolSelector:
maxPods: 244
imageMinimumGCAge: 1m
imageGCHighThresholdPercent: 40
imageGCLowThresholdPercent: 30
matchLabels:
custom-kubelet: test-pods
$ oc label mcp worker custom-kubelet=test-pods
$ oc create -f custom-kubelet-fail.yaml
wait several minutes,check mcc log:
$ oc get pod -n openshift-machine-config-operator
NAME READY STATUS RESTARTS AGE
machine-config-controller-7698f5c54-5nc5v 0/1 CrashLoopBackOff 2 153m
machine-config-daemon-9f85w 2/2 Running 0 153m
machine-config-daemon-9t6pv 2/2 Running 0 151m
$ oc logs -f machine-config-controller-7698f5c54-5nc5v -n openshift-machine-config-operator
I0816 10:06:59.966353 1 start.go:50] Version: v4.6.0-202108140028.p0.git.c55adc4-dirty (c55adc48b86a0d0d70d6fcbbcdeaa8094734817a)
I0816 10:06:59.968185 1 leaderelection.go:243] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller...
I0816 10:08:55.625916 1 leaderelection.go:253] successfully acquired lease openshift-machine-config-operator/machine-config-controller
I0816 10:08:55.735799 1 node_controller.go:152] Starting MachineConfigController-NodeController
I0816 10:08:55.736731 1 container_runtime_config_controller.go:189] Starting MachineConfigController-ContainerRuntimeConfigController
I0816 10:08:55.737073 1 render_controller.go:124] Starting MachineConfigController-RenderController
I0816 10:08:55.741443 1 kubelet_config_controller.go:161] Starting MachineConfigController-KubeletConfigController
I0816 10:08:55.741819 1 template_controller.go:183] Starting MachineConfigController-TemplateController
E0816 10:08:55.818038 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 258 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1818500, 0x2702b00)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa6
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
panic(0x1818500, 0x2702b00)
/usr/lib/golang/src/runtime/panic.go:969 +0x1b9
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).syncKubeletConfig(0xc000108270, 0xc0004fef20, 0x13, 0x0, 0x0)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:465 +0xc6b
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).processNextWorkItem(0xc000108270, 0x203000)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:278 +0xcf
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).worker(0xc000108270)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:267 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007a4040)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007a4040, 0x1c32480, 0xc00061e870, 0x1, 0xc0000ca600)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007a4040, 0x3b9aca00, 0x0, 0x1, 0xc0000ca600)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0007a4040, 0x3b9aca00, 0xc0000ca600)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).Run
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:165 +0x23e
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x16833eb]
goroutine 258 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c
panic(0x1818500, 0x2702b00)
/usr/lib/golang/src/runtime/panic.go:969 +0x1b9
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).syncKubeletConfig(0xc000108270, 0xc0004fef20, 0x13, 0x0, 0x0)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:465 +0xc6b
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).processNextWorkItem(0xc000108270, 0x203000)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:278 +0xcf
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).worker(0xc000108270)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:267 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007a4040)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007a4040, 0x1c32480, 0xc00061e870, 0x1, 0xc0000ca600)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007a4040, 0x3b9aca00, 0x0, 0x1, 0xc0000ca600)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0007a4040, 0x3b9aca00, 0xc0000ca600)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).Run
/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:165 +0x23e
test pass on cluster lanuched by cluster-bot: launch openshift/machine-config-operator#2719 aws there is no panic! the bug doesn't move to verified automatically by process of verification before pr merge, so set it verified directly. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3395 *** Bug 2069764 has been marked as a duplicate of this bug. *** |