Bug 1894763 - Undiagnosed panic detected in pod
Summary: Undiagnosed panic detected in pod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.z
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks: 1896725
TreeView+ depends on / blocked
 
Reported: 2020-11-05 01:50 UTC by Ben Parees
Modified: 2020-11-24 12:42 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1896725 (view as bug list)
Environment:
Undiagnosed panic detected in pod
Last Closed: 2020-11-24 12:42:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5118 0 None None None 2020-11-24 12:42:56 UTC

Description Ben Parees 2020-11-05 01:50:10 UTC
test:
Undiagnosed panic detected in pod 

specifically it's failing in machine-api-operator 
(not sure if i got the right BZ component for that operator):

pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator.log.gz:E1105 00:25:23.957881       1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3)
pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator_previous.log.gz:E1105 00:25:23.957881       1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3)



is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod


Sample job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.4-to-4.5/1324097015897919488

Comment 1 Joel Speed 2020-11-10 11:02:15 UTC
I'm not entirely sure what is causing the panic, but it is coming from a library that applies deployments for the operator.

This is happening in the 4.4 version of the code and is already fixed (mitigated?) in 4.5 as we replaced the code that's causing panic with a different implementation [1].

To resolve this, we could backport this fix to 4.4 but I'm uncertain if it's worth the effort at this stage in the 4.4 cycle. I estimated this as medium severity which would suggest possibly a no fix, but if you feel this should be higher I'm open to changing that.

@Ben Parees, would you prefer we increase severity and backport the library substitution? 

[1]: https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4

Comment 2 Ben Parees 2020-11-10 20:45:36 UTC
I guess i'd say it depends on the risk and difficulty of the backport.  It's definitely happening enough that i think it's worth fixing just to clean up our CI, if the risk+effort are not substantial.

Comment 4 Joel Speed 2020-11-11 11:46:14 UTC
To QE: For 4.5.z I believe this issue was resolved in https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4 which was part of https://github.com/openshift/machine-api-operator/pull/536

Since that has already merged and been through QE, I'm moving this BZ to ON_QA so that you can verify that it does fix the issue (no panics from the CI)

Comment 5 Ben Parees 2020-11-11 14:40:05 UTC
> Go via the proper route and create a chain of bugs from 4.7 through to 4.5 which are no-ops and get these verified, not sure how this would work.

that's the proper approach, QE can verify the BZs by using ci-search results to confirm that this particular panic is not being seen in CI jobs for those other releases.

Comment 6 sunzhaohua 2020-11-13 03:13:38 UTC
Verified
from the ci-search results we can confirm that this panic is not being seen in CI jobs for 4.5
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod

Comment 9 errata-xmlrpc 2020-11-24 12:42:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.5.20 bug fix and golang security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5118


Note You need to log in before you can comment on or make changes to this bug.