1894763 – Undiagnosed panic detected in pod

Bug 1894763 - Undiagnosed panic detected in pod

Summary: Undiagnosed panic detected in pod

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.5.z
Assignee:	Joel Speed
QA Contact:	sunzhaohua
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1896725
TreeView+	depends on / blocked

Reported:	2020-11-05 01:50 UTC by Ben Parees
Modified:	2020-11-24 12:42 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1896725 (view as bug list)
Environment:	Undiagnosed panic detected in pod
Last Closed:	2020-11-24 12:42:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:5118	0	None	None	None	2020-11-24 12:42:56 UTC

Description Ben Parees 2020-11-05 01:50:10 UTC

test:
Undiagnosed panic detected in pod 

specifically it's failing in machine-api-operator 
(not sure if i got the right BZ component for that operator):

pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator.log.gz:E1105 00:25:23.957881       1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3)
pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator_previous.log.gz:E1105 00:25:23.957881       1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3)



is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod


Sample job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.4-to-4.5/1324097015897919488

Comment 1 Joel Speed 2020-11-10 11:02:15 UTC

I'm not entirely sure what is causing the panic, but it is coming from a library that applies deployments for the operator.

This is happening in the 4.4 version of the code and is already fixed (mitigated?) in 4.5 as we replaced the code that's causing panic with a different implementation [1].

To resolve this, we could backport this fix to 4.4 but I'm uncertain if it's worth the effort at this stage in the 4.4 cycle. I estimated this as medium severity which would suggest possibly a no fix, but if you feel this should be higher I'm open to changing that.

@Ben Parees, would you prefer we increase severity and backport the library substitution? 

[1]: https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4

Comment 2 Ben Parees 2020-11-10 20:45:36 UTC

I guess i'd say it depends on the risk and difficulty of the backport.  It's definitely happening enough that i think it's worth fixing just to clean up our CI, if the risk+effort are not substantial.

Comment 4 Joel Speed 2020-11-11 11:46:14 UTC

To QE: For 4.5.z I believe this issue was resolved in https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4 which was part of https://github.com/openshift/machine-api-operator/pull/536

Since that has already merged and been through QE, I'm moving this BZ to ON_QA so that you can verify that it does fix the issue (no panics from the CI)

Comment 5 Ben Parees 2020-11-11 14:40:05 UTC

> Go via the proper route and create a chain of bugs from 4.7 through to 4.5 which are no-ops and get these verified, not sure how this would work.

that's the proper approach, QE can verify the BZs by using ci-search results to confirm that this particular panic is not being seen in CI jobs for those other releases.

Comment 6 sunzhaohua 2020-11-13 03:13:38 UTC

Verified
from the ci-search results we can confirm that this panic is not being seen in CI jobs for 4.5
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod

Comment 9 errata-xmlrpc 2020-11-24 12:42:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.5.20 bug fix and golang security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5118

Note You need to log in before you can comment on or make changes to this bug.