Undiagnosed panic detected in pod
is failing frequently in CI, see search results:
Example 4.6.0-0.ci-2020-09-01-180917 -> 4.6.0-0.ci-2020-09-02-112251 job  loops on:
pods/openshift-kube-apiserver_kube-apiserver-ci-op-x44nqxlf-2f611-bdgq5-master-1_kube-apiserver.log.gz:E0902 19:31:10.506239 17 runtime.go:76] Observed a panic: runtime error: invalid memory address or nil pointer dereference
$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod' | grep 'failures match'
release-openshift-okd-installer-e2e-aws-4.6 - 67 runs, 97% failed, 52% of failures match
release-openshift-okd-installer-e2e-aws-upgrade - 87 runs, 67% failed, 22% of failures match
release-openshift-origin-installer-e2e-aws-4.6 - 4 runs, 50% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-compact-4.6 - 4 runs, 75% failed, 67% of failures match
release-openshift-origin-installer-e2e-aws-disruptive-4.6 - 4 runs, 100% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-sdn-multitenant-4.6 - 4 runs, 50% failed, 50% of failures match
release-openshift-origin-installer-e2e-aws-serial-4.6 - 107 runs, 47% failed, 118% of failures match
release-openshift-origin-installer-e2e-aws-shared-vpc-4.6 - 7 runs, 43% failed, 133% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-4.5-stable-to-4.6-ci - 79 runs, 25% failed, 90% of failures match
release-openshift-origin-installer-e2e-aws-upgrade - 671 runs, 31% failed, 18% of failures match
release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.6 - 8 runs, 38% failed, 33% of failures match
release-openshift-origin-installer-e2e-azure-4.6 - 27 runs, 85% failed, 9% of failures match
release-openshift-origin-installer-e2e-azure-shared-vpc-4.6 - 7 runs, 86% failed, 17% of failures match
release-openshift-origin-installer-e2e-azure-upgrade-4.5-stable-to-4.6-ci - 28 runs, 96% failed, 7% of failures match
release-openshift-origin-installer-e2e-azure-upgrade-4.6 - 28 runs, 96% failed, 7% of failures match
release-openshift-origin-installer-e2e-gcp-4.6 - 93 runs, 53% failed, 69% of failures match
release-openshift-origin-installer-e2e-gcp-4.7 - 11 runs, 100% failed, 36% of failures match
release-openshift-origin-installer-e2e-gcp-compact-4.6 - 4 runs, 75% failed, 33% of failures match
release-openshift-origin-installer-e2e-gcp-shared-vpc-4.6 - 7 runs, 14% failed, 200% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade - 150 runs, 31% failed, 59% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade-4.5-stable-to-4.6-ci - 27 runs, 15% failed, 100% of failures match
release-openshift-origin-installer-e2e-gcp-upgrade-4.6 - 28 runs, 36% failed, 60% of failures match
release-openshift-origin-installer-launch-aws - 153 runs, 52% failed, 5% of failures match
release-openshift-origin-installer-launch-gcp - 527 runs, 55% failed, 11% of failures match
Apparently a dup of bug 1875038.
*** This bug has been marked as a duplicate of bug 1875038 ***
Pulling this back out into its own bug at Michal's request.
Although it is not clear to me why https://github.com/openshift/cluster-kube-apiserver-operator/pull/941 is not a fix for the panic that lead me to open this bug, so I'm fine if this gets re-closed as a dup ;).
This will be fixed by https://github.com/kubernetes/kubernetes/pull/94589
The fix you referenced Trevor is for KAS operator (during graceful shutdown, controllers that have not started yet because they were waiting for caches to sync received context close which closed the channel the WaitForCacheSync() used which resulted in panic inside that controller.
The fix Lukasz is working on is in operand and require backport from upstream.
*** Bug 1879208 has been marked as a duplicate of this bug. ***
See the search results: https://search.ci.openshift.org/?search=Undiagnosed+panic+detected+in+pod&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job
Matched keywords 'Observed a panic: runtime error: invalid memory address or nil pointer dereference', there are total nine which involves openshift-apiserver_apiserver and openshift-kube-apiserver_kube-apiserver. Some are related to 4.3/4.4/4.5, others are related to indeterminate version，will observe a couple of days about this.
In the past seven days, no longer saw the panic from 4.6 related tests, still existed on 4.3,4.4 and 4.5, here is searching results: https://search.ci.openshift.org/?search=kube-apiserver.log.*Observed+a+panic%3A+runtime+error%3A+invalid+memory+address+or+nil+pointer+dereference&maxAge=168h&context=2&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job.