test: Undiagnosed panic detected in pod is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod Example 4.6.0-0.ci-2020-09-01-180917 -> 4.6.0-0.ci-2020-09-02-112251 job [1] loops on: pods/openshift-kube-apiserver_kube-apiserver-ci-op-x44nqxlf-2f611-bdgq5-master-1_kube-apiserver.log.gz:E0902 19:31:10.506239 17 runtime.go:76] Observed a panic: runtime error: invalid memory address or nil pointer dereference Seems common: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod' | grep 'failures match' ... release-openshift-okd-installer-e2e-aws-4.6 - 67 runs, 97% failed, 52% of failures match release-openshift-okd-installer-e2e-aws-upgrade - 87 runs, 67% failed, 22% of failures match release-openshift-origin-installer-e2e-aws-4.6 - 4 runs, 50% failed, 50% of failures match release-openshift-origin-installer-e2e-aws-compact-4.6 - 4 runs, 75% failed, 67% of failures match release-openshift-origin-installer-e2e-aws-disruptive-4.6 - 4 runs, 100% failed, 50% of failures match release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.6 - 4 runs, 50% failed, 50% of failures match release-openshift-origin-installer-e2e-aws-serial-4.6 - 107 runs, 47% failed, 118% of failures match release-openshift-origin-installer-e2e-aws-shared-vpc-4.6 - 7 runs, 43% failed, 133% of failures match release-openshift-origin-installer-e2e-aws-upgrade-4.5-stable-to-4.6-ci - 79 runs, 25% failed, 90% of failures match release-openshift-origin-installer-e2e-aws-upgrade - 671 runs, 31% failed, 18% of failures match release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.6 - 8 runs, 38% failed, 33% of failures match release-openshift-origin-installer-e2e-azure-4.6 - 27 runs, 85% failed, 9% of failures match release-openshift-origin-installer-e2e-azure-shared-vpc-4.6 - 7 runs, 86% failed, 17% of failures match release-openshift-origin-installer-e2e-azure-upgrade-4.5-stable-to-4.6-ci - 28 runs, 96% failed, 7% of failures match release-openshift-origin-installer-e2e-azure-upgrade-4.6 - 28 runs, 96% failed, 7% of failures match release-openshift-origin-installer-e2e-gcp-4.6 - 93 runs, 53% failed, 69% of failures match release-openshift-origin-installer-e2e-gcp-4.7 - 11 runs, 100% failed, 36% of failures match release-openshift-origin-installer-e2e-gcp-compact-4.6 - 4 runs, 75% failed, 33% of failures match release-openshift-origin-installer-e2e-gcp-shared-vpc-4.6 - 7 runs, 14% failed, 200% of failures match release-openshift-origin-installer-e2e-gcp-upgrade - 150 runs, 31% failed, 59% of failures match release-openshift-origin-installer-e2e-gcp-upgrade-4.5-stable-to-4.6-ci - 27 runs, 15% failed, 100% of failures match release-openshift-origin-installer-e2e-gcp-upgrade-4.6 - 28 runs, 36% failed, 60% of failures match release-openshift-origin-installer-launch-aws - 153 runs, 52% failed, 5% of failures match release-openshift-origin-installer-launch-gcp - 527 runs, 55% failed, 11% of failures match [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.6/1301225743073677312
Apparently a dup of bug 1875038. *** This bug has been marked as a duplicate of bug 1875038 ***
Pulling this back out into its own bug at Michal's request. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1875038#c4
Although it is not clear to me why https://github.com/openshift/cluster-kube-apiserver-operator/pull/941 is not a fix for the panic that lead me to open this bug, so I'm fine if this gets re-closed as a dup ;).
This will be fixed by https://github.com/kubernetes/kubernetes/pull/94589 The fix you referenced Trevor is for KAS operator (during graceful shutdown, controllers that have not started yet because they were waiting for caches to sync received context close which closed the channel the WaitForCacheSync() used which resulted in panic inside that controller. The fix Lukasz is working on is in operand and require backport from upstream.
*** Bug 1879208 has been marked as a duplicate of this bug. ***
See the search results: https://search.ci.openshift.org/?search=Undiagnosed+panic+detected+in+pod&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job Matched keywords 'Observed a panic: runtime error: invalid memory address or nil pointer dereference', there are total nine which involves openshift-apiserver_apiserver and openshift-kube-apiserver_kube-apiserver. Some are related to 4.3/4.4/4.5, others are related to indeterminate version,will observe a couple of days about this.
In the past seven days, no longer saw the panic from 4.6 related tests, still existed on 4.3,4.4 and 4.5, here is searching results: https://search.ci.openshift.org/?search=kube-apiserver.log.*Observed+a+panic%3A+runtime+error%3A+invalid+memory+address+or+nil+pointer+dereference&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
Do we have a backport for 4.5 on this issue?