1875046 – Undiagnosed panic detected in pod: openshift-kube-apiserver_kube-apiserver: runtime.go:76: invalid memory address or nil pointer dereference

Bug 1875046 - Undiagnosed panic detected in pod: openshift-kube-apiserver_kube-apiserver: runtime.go:76: invalid memory address or nil pointer dereference [NEEDINFO]

Summary: Undiagnosed panic detected in pod: openshift-kube-apiserver_kube-apiserver: r...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Lukasz Szaszkiewicz
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1879208 1919968
TreeView+	depends on / blocked

Reported:	2020-09-02 20:32 UTC by W. Trevor King
Modified:	2021-01-27 07:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1919966 (view as bug list)
Environment:	Undiagnosed panic detected in pod
Last Closed:	2020-10-27 16:37:14 UTC
Target Upstream Version:
Embargoed:
Flags:	alchan: needinfo? hgomes: needinfo?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift kubernetes pull 338	0	None	closed	Bug 1875046: Undiagnosed panic detected in pod: openshift-kube-apiserver_kube-apiserver: runtime.go:76: invalid memory a...	2021-02-19 10:44:01 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:37:32 UTC

Description W. Trevor King 2020-09-02 20:32:34 UTC

test:
Undiagnosed panic detected in pod 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod

Example 4.6.0-0.ci-2020-09-01-180917 -> 4.6.0-0.ci-2020-09-02-112251 job [1] loops on:

pods/openshift-kube-apiserver_kube-apiserver-ci-op-x44nqxlf-2f611-bdgq5-master-1_kube-apiserver.log.gz:E0902 19:31:10.506239      17 runtime.go:76] Observed a panic: runtime error: invalid memory address or nil pointer dereference

Seems common:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod' | grep 'failures match'
...
release-openshift-okd-installer-e2e-aws-4.6 - 67 runs, 97% failed, 52% of failures match
release-openshift-okd-installer-e2e-aws-upgrade - 87 runs, 67% failed, 22% of failures match
release-openshift-origin-installer-e2e-aws-4.6 - 4 runs, 50% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-compact-4.6 - 4 runs, 75% failed, 67% of failures match
release-openshift-origin-installer-e2e-aws-disruptive-4.6 - 4 runs, 100% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.6 - 4 runs, 50% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-serial-4.6 - 107 runs, 47% failed, 118% of failures match
release-openshift-origin-installer-e2e-aws-shared-vpc-4.6 - 7 runs, 43% failed, 133% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-4.5-stable-to-4.6-ci - 79 runs, 25% failed, 90% of failures match
release-openshift-origin-installer-e2e-aws-upgrade - 671 runs, 31% failed, 18% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.6 - 8 runs, 38% failed, 33% of failures match
release-openshift-origin-installer-e2e-azure-4.6 - 27 runs, 85% failed, 9% of failures match
release-openshift-origin-installer-e2e-azure-shared-vpc-4.6 - 7 runs, 86% failed, 17% of failures match
release-openshift-origin-installer-e2e-azure-upgrade-4.5-stable-to-4.6-ci - 28 runs, 96% failed, 7% of failures match
release-openshift-origin-installer-e2e-azure-upgrade-4.6 - 28 runs, 96% failed, 7% of failures match
release-openshift-origin-installer-e2e-gcp-4.6 - 93 runs, 53% failed, 69% of failures match
release-openshift-origin-installer-e2e-gcp-4.7 - 11 runs, 100% failed, 36% of failures match
release-openshift-origin-installer-e2e-gcp-compact-4.6 - 4 runs, 75% failed, 33% of failures match
release-openshift-origin-installer-e2e-gcp-shared-vpc-4.6 - 7 runs, 14% failed, 200% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade - 150 runs, 31% failed, 59% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade-4.5-stable-to-4.6-ci - 27 runs, 15% failed, 100% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade-4.6 - 28 runs, 36% failed, 60% of failures match
release-openshift-origin-installer-launch-aws - 153 runs, 52% failed, 5% of failures match
release-openshift-origin-installer-launch-gcp - 527 runs, 55% failed, 11% of failures match


[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.6/1301225743073677312

Comment 1 W. Trevor King 2020-09-02 20:37:26 UTC

Apparently a dup of bug 1875038.

*** This bug has been marked as a duplicate of bug 1875038 ***

Comment 2 W. Trevor King 2020-09-08 21:53:50 UTC

Pulling this back out into its own bug at Michal's request.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1875038#c4

Comment 3 W. Trevor King 2020-09-08 23:17:25 UTC

Although it is not clear to me why https://github.com/openshift/cluster-kube-apiserver-operator/pull/941 is not a fix for the panic that lead me to open this bug, so I'm fine if this gets re-closed as a dup ;).

Comment 4 Michal Fojtik 2020-09-09 08:19:39 UTC

This will be fixed by https://github.com/kubernetes/kubernetes/pull/94589

The fix you referenced Trevor is for KAS operator (during graceful shutdown, controllers that have not started yet because they were waiting for caches to sync received context close which closed the channel the WaitForCacheSync() used which resulted in panic inside that controller.

The fix Lukasz is working on is in operand and require backport from upstream.

Comment 8 Venkata Siva Teja Areti 2020-09-15 21:41:18 UTC

*** Bug 1879208 has been marked as a duplicate of this bug. ***

Comment 10 Ke Wang 2020-09-18 11:16:11 UTC

See the search results: https://search.ci.openshift.org/?search=Undiagnosed+panic+detected+in+pod&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job
Matched keywords 'Observed a panic: runtime error: invalid memory address or nil pointer dereference', there are total nine which involves openshift-apiserver_apiserver and openshift-kube-apiserver_kube-apiserver. Some are related to 4.3/4.4/4.5, others are related to indeterminate version，will observe a couple of days about this.

Comment 11 Ke Wang 2020-09-23 02:45:56 UTC

In the past seven days, no longer saw the panic from 4.6 related tests, still existed on 4.3,4.4 and 4.5, here is searching results: https://search.ci.openshift.org/?search=kube-apiserver.log.*Observed+a+panic%3A+runtime+error%3A+invalid+memory+address+or+nil+pointer+dereference&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job.

Comment 13 errata-xmlrpc 2020-10-27 16:37:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 14 hgomes 2020-11-19 00:58:04 UTC

Do we have a backport for 4.5 on this issue?

Note You need to log in before you can comment on or make changes to this bug.