Bug 1875046

Summary: Undiagnosed panic detected in pod: openshift-kube-apiserver_kube-apiserver: runtime.go:76: invalid memory address or nil pointer dereference
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: kube-apiserverAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: alchan, aos-bugs, hgomes, jkaur, lszaszki, mfojtik, xxia
Target Milestone: ---Keywords: Reopened
Target Release: 4.6.0Flags: alchan: needinfo?
hgomes: needinfo?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1919966 (view as bug list) Environment:
Undiagnosed panic detected in pod
Last Closed: 2020-10-27 16:37:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1879208, 1919968    

Description W. Trevor King 2020-09-02 20:32:34 UTC
test:
Undiagnosed panic detected in pod 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod

Example 4.6.0-0.ci-2020-09-01-180917 -> 4.6.0-0.ci-2020-09-02-112251 job [1] loops on:

pods/openshift-kube-apiserver_kube-apiserver-ci-op-x44nqxlf-2f611-bdgq5-master-1_kube-apiserver.log.gz:E0902 19:31:10.506239      17 runtime.go:76] Observed a panic: runtime error: invalid memory address or nil pointer dereference

Seems common:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod' | grep 'failures match'
...
release-openshift-okd-installer-e2e-aws-4.6 - 67 runs, 97% failed, 52% of failures match
release-openshift-okd-installer-e2e-aws-upgrade - 87 runs, 67% failed, 22% of failures match
release-openshift-origin-installer-e2e-aws-4.6 - 4 runs, 50% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-compact-4.6 - 4 runs, 75% failed, 67% of failures match
release-openshift-origin-installer-e2e-aws-disruptive-4.6 - 4 runs, 100% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.6 - 4 runs, 50% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-serial-4.6 - 107 runs, 47% failed, 118% of failures match
release-openshift-origin-installer-e2e-aws-shared-vpc-4.6 - 7 runs, 43% failed, 133% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-4.5-stable-to-4.6-ci - 79 runs, 25% failed, 90% of failures match
release-openshift-origin-installer-e2e-aws-upgrade - 671 runs, 31% failed, 18% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.6 - 8 runs, 38% failed, 33% of failures match
release-openshift-origin-installer-e2e-azure-4.6 - 27 runs, 85% failed, 9% of failures match
release-openshift-origin-installer-e2e-azure-shared-vpc-4.6 - 7 runs, 86% failed, 17% of failures match
release-openshift-origin-installer-e2e-azure-upgrade-4.5-stable-to-4.6-ci - 28 runs, 96% failed, 7% of failures match
release-openshift-origin-installer-e2e-azure-upgrade-4.6 - 28 runs, 96% failed, 7% of failures match
release-openshift-origin-installer-e2e-gcp-4.6 - 93 runs, 53% failed, 69% of failures match
release-openshift-origin-installer-e2e-gcp-4.7 - 11 runs, 100% failed, 36% of failures match
release-openshift-origin-installer-e2e-gcp-compact-4.6 - 4 runs, 75% failed, 33% of failures match
release-openshift-origin-installer-e2e-gcp-shared-vpc-4.6 - 7 runs, 14% failed, 200% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade - 150 runs, 31% failed, 59% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade-4.5-stable-to-4.6-ci - 27 runs, 15% failed, 100% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade-4.6 - 28 runs, 36% failed, 60% of failures match
release-openshift-origin-installer-launch-aws - 153 runs, 52% failed, 5% of failures match
release-openshift-origin-installer-launch-gcp - 527 runs, 55% failed, 11% of failures match


[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.6/1301225743073677312

Comment 1 W. Trevor King 2020-09-02 20:37:26 UTC
Apparently a dup of bug 1875038.

*** This bug has been marked as a duplicate of bug 1875038 ***

Comment 2 W. Trevor King 2020-09-08 21:53:50 UTC
Pulling this back out into its own bug at Michal's request.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1875038#c4

Comment 3 W. Trevor King 2020-09-08 23:17:25 UTC
Although it is not clear to me why https://github.com/openshift/cluster-kube-apiserver-operator/pull/941 is not a fix for the panic that lead me to open this bug, so I'm fine if this gets re-closed as a dup ;).

Comment 4 Michal Fojtik 2020-09-09 08:19:39 UTC
This will be fixed by https://github.com/kubernetes/kubernetes/pull/94589

The fix you referenced Trevor is for KAS operator (during graceful shutdown, controllers that have not started yet because they were waiting for caches to sync received context close which closed the channel the WaitForCacheSync() used which resulted in panic inside that controller.

The fix Lukasz is working on is in operand and require backport from upstream.

Comment 8 Venkata Siva Teja Areti 2020-09-15 21:41:18 UTC
*** Bug 1879208 has been marked as a duplicate of this bug. ***

Comment 10 Ke Wang 2020-09-18 11:16:11 UTC
See the search results: https://search.ci.openshift.org/?search=Undiagnosed+panic+detected+in+pod&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job
Matched keywords 'Observed a panic: runtime error: invalid memory address or nil pointer dereference', there are total nine which involves openshift-apiserver_apiserver and openshift-kube-apiserver_kube-apiserver. Some are related to 4.3/4.4/4.5, others are related to indeterminate version,will observe a couple of days about this.

Comment 11 Ke Wang 2020-09-23 02:45:56 UTC
In the past seven days, no longer saw the panic from 4.6 related tests, still existed on 4.3,4.4 and 4.5, here is searching results: https://search.ci.openshift.org/?search=kube-apiserver.log.*Observed+a+panic%3A+runtime+error%3A+invalid+memory+address+or+nil+pointer+dereference&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job.

Comment 13 errata-xmlrpc 2020-10-27 16:37:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 14 hgomes 2020-11-19 00:58:04 UTC
Do we have a backport for 4.5 on this issue?