Bug 1975626 - Observed a panic: (runtime error: invalid memory address or nil pointer dereference)" in MCC
Summary: Observed a panic: (runtime error: invalid memory address or nil pointer deref...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.z
Assignee: Qi Wang
QA Contact: MinLi
URL:
Whiteboard:
Depends On: 1903290
Blocks: 2076645
TreeView+ depends on / blocked
 
Reported: 2021-06-24 05:30 UTC by Amit Kesarkar
Modified: 2022-04-19 15:33 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-09 01:52:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2672 0 None open Bug 1975626: KubeletConfig validation warning in CRD and Docs 2021-08-23 16:18:04 UTC
Github openshift machine-config-operator pull 2699 0 None None None 2021-07-31 15:44:43 UTC
Github openshift machine-config-operator pull 2719 0 None None None 2021-08-18 15:05:08 UTC
Red Hat Product Errata RHBA-2021:3395 0 None None None 2021-09-09 01:53:14 UTC

Description Amit Kesarkar 2021-06-24 05:30:44 UTC
Description of problem: The MCC pos goes continuously restarts


Version-Release number of selected component (if applicable):


How reproducible:
Customer specific, may be the "spec" section of kubelet config is missing

Steps to Reproduce:
NA

Actual results:

The MCC pods shows many restarts 

NAME                                        READY  STATUS   RESTARTS  AGE   IP             NODE
machine-config-controller-xxxx-xxx  0/1    Running  1521    6 5d    172.x.x.x    example-master-0

Expected results:
The MCC pod should come up

Additional info:

A similar bug exist  https://bugzilla.redhat.com/show_bug.cgi?id=1886636

Comment 1 Yu Qi Zhang 2021-06-24 16:38:59 UTC
Could you please provide more information? At the very least:

1. version/environments/known customizations of the cluster
2. must-gather of the cluster
3. MCC pods status or logs
4. MCO clusteroperator status

Comment 7 MinLi 2021-07-09 08:52:55 UTC
Hi,  Harshal Patil

I understand this case panic because the kubeletconfig lack "spec.kubeletConfig" part. 
Yet the validation of [1] is for the field of kubeletconfig.spec.kubeletConfig, and it just fix since 4.7. 
we need to backport to 4.6 if needed.

$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-07-07-181104   True        False         22h     Cluster version is 4.6.0-0.nightly-2021-07-07-181104


$ oc explain kubeletconfig.spec.kubeletConfig
KIND:     KubeletConfig
VERSION:  machineconfiguration.openshift.io/v1

DESCRIPTION:
     <empty>


[1] https://github.com/openshift/machine-config-operator/issues/2357

Comment 9 MinLi 2021-07-15 08:58:11 UTC
there are duplicate descriptions: 

[root@qe-preserve-minmlimerrn-1 ~]# oc explain kubeletconfig.spec.kubeletConfig --recursive=true
KIND:     KubeletConfig
VERSION:  machineconfiguration.openshift.io/v1

DESCRIPTION:
     The fields of the kubelet configuration are defined in kubernetes upstream.
     Please refer to the types defined in the version/commit used by OpenShift
     of the upstream kubernetes. It's important to note that, since the fields
     of the kubelet configuration are directly fetched from upstream the
     validation of those values is handled directly by the kubelet. Please refer
     to the upstream version of the relevant kubernetes for the valid values of
     these fields. Invalid values of the kubelet configuration fields may render
     cluster nodes unusable.

     The fields of the kubelet configuration are defined in kubernetes upstream.
     Please refer to the types defined in the version/commit used by OpenShift
     of the upstream kubernetes. It's important to note that, since the fields
     of the kubelet configuration are directly fetched from upstream the
     validation of those values is handled directly by the kubelet. Please refer
     to the upstream version of the relevant kubernetes for the valid values of
     these fields. Invalid values of the kubelet configuration fields may render
     cluster nodes unusable.
[root@qe-preserve-minmlimerrn-1 ~]# 

[root@qe-preserve-minmlimerrn-1 ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-07-15-035804   True        False         35m     Cluster version is 4.6.0-0.nightly-2021-07-15-035804

Comment 10 MinLi 2021-07-15 09:02:23 UTC
and @Qi Wang, 

Do you know how can I verify the doc fix? Can you provide a doc url? I can't reach it by pull request "docs/KubeletConfigDesign.md"

Comment 11 Qi Wang 2021-07-16 16:29:48 UTC
Hi, @minmli, which doc fix, the openshift doc like https://docs.openshift.com/container-platform/4.7/welcome/index.html? Did the original BZ fix with the doc?

Comment 12 MinLi 2021-07-22 09:17:16 UTC
(In reply to Qi Wang from comment #11)
> Hi, @minmli, which doc fix, the openshift doc like
> https://docs.openshift.com/container-platform/4.7/welcome/index.html? Did
> the original BZ fix with the doc?

I got it, the doc fix is not in the official OpenShift doc, but in MCO doc: https://github.com/harche/machine-config-operator/blob/8136a89ada3e3cd86c4140398a057384e3fde364/docs/KubeletConfigDesign.md

Comment 13 Qi Wang 2021-08-02 15:16:59 UTC
https://github.com/openshift/machine-config-operator/pull/2699 might be a fix for this BZ, it also keeps the same implementation as the current upstream, but I didn't find a way to replicate this BZ.

I haven't figured out why the description is duplicate as Comment9 yet.

Comment 15 MinLi 2021-08-16 10:17:30 UTC
reproduce the issue on version: 4.6.0-0.nightly-2021-08-16-005317

cat custom-kubelet-fail.yaml:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  annotations: {}
  name: custom-kubelet-test
spec:
  machineConfigPoolSelector:
    maxPods: 244
    imageMinimumGCAge: 1m
    imageGCHighThresholdPercent: 40
    imageGCLowThresholdPercent: 30
    matchLabels:
      custom-kubelet: test-pods

$ oc label mcp worker custom-kubelet=test-pods
$ oc create -f custom-kubelet-fail.yaml
wait several minutes,check mcc log:
$ oc get pod -n openshift-machine-config-operator
NAME                                        READY   STATUS             RESTARTS   AGE
machine-config-controller-7698f5c54-5nc5v   0/1     CrashLoopBackOff   2          153m
machine-config-daemon-9f85w                 2/2     Running            0          153m
machine-config-daemon-9t6pv                 2/2     Running            0          151m

$ oc logs -f machine-config-controller-7698f5c54-5nc5v -n openshift-machine-config-operator
I0816 10:06:59.966353       1 start.go:50] Version: v4.6.0-202108140028.p0.git.c55adc4-dirty (c55adc48b86a0d0d70d6fcbbcdeaa8094734817a)
I0816 10:06:59.968185       1 leaderelection.go:243] attempting to acquire leader lease  openshift-machine-config-operator/machine-config-controller...
I0816 10:08:55.625916       1 leaderelection.go:253] successfully acquired lease openshift-machine-config-operator/machine-config-controller
I0816 10:08:55.735799       1 node_controller.go:152] Starting MachineConfigController-NodeController
I0816 10:08:55.736731       1 container_runtime_config_controller.go:189] Starting MachineConfigController-ContainerRuntimeConfigController
I0816 10:08:55.737073       1 render_controller.go:124] Starting MachineConfigController-RenderController
I0816 10:08:55.741443       1 kubelet_config_controller.go:161] Starting MachineConfigController-KubeletConfigController
I0816 10:08:55.741819       1 template_controller.go:183] Starting MachineConfigController-TemplateController
E0816 10:08:55.818038       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 258 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1818500, 0x2702b00)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa6
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
panic(0x1818500, 0x2702b00)
	/usr/lib/golang/src/runtime/panic.go:969 +0x1b9
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).syncKubeletConfig(0xc000108270, 0xc0004fef20, 0x13, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:465 +0xc6b
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).processNextWorkItem(0xc000108270, 0x203000)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:278 +0xcf
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).worker(0xc000108270)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:267 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007a4040)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007a4040, 0x1c32480, 0xc00061e870, 0x1, 0xc0000ca600)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007a4040, 0x3b9aca00, 0x0, 0x1, 0xc0000ca600)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0007a4040, 0x3b9aca00, 0xc0000ca600)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).Run
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:165 +0x23e
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x16833eb]

goroutine 258 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c
panic(0x1818500, 0x2702b00)
	/usr/lib/golang/src/runtime/panic.go:969 +0x1b9
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).syncKubeletConfig(0xc000108270, 0xc0004fef20, 0x13, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:465 +0xc6b
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).processNextWorkItem(0xc000108270, 0x203000)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:278 +0xcf
github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).worker(0xc000108270)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:267 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007a4040)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007a4040, 0x1c32480, 0xc00061e870, 0x1, 0xc0000ca600)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007a4040, 0x3b9aca00, 0x0, 0x1, 0xc0000ca600)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0007a4040, 0x3b9aca00, 0xc0000ca600)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/openshift/machine-config-operator/pkg/controller/kubelet-config.(*Controller).Run
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/kubelet-config/kubelet_config_controller.go:165 +0x23e

Comment 16 MinLi 2021-08-26 10:16:54 UTC
test pass on cluster lanuched by cluster-bot: launch openshift/machine-config-operator#2719 aws

there is no panic!

Comment 20 MinLi 2021-09-02 03:30:29 UTC
the bug doesn't move to verified automatically by process of verification before pr merge, so set it verified directly.

Comment 22 errata-xmlrpc 2021-09-09 01:52:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3395

Comment 23 Qi Wang 2022-04-19 15:33:04 UTC
*** Bug 2069764 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.