Bug 1894763

Summary: Undiagnosed panic detected in pod
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium    
Version: 4.4   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1896725 (view as bug list) Environment:
Undiagnosed panic detected in pod
Last Closed: 2020-11-24 12:42:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1896725    

Description Ben Parees 2020-11-05 01:50:10 UTC
test:
Undiagnosed panic detected in pod 

specifically it's failing in machine-api-operator 
(not sure if i got the right BZ component for that operator):

pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator.log.gz:E1105 00:25:23.957881       1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3)
pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator_previous.log.gz:E1105 00:25:23.957881       1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3)



is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod


Sample job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.4-to-4.5/1324097015897919488

Comment 1 Joel Speed 2020-11-10 11:02:15 UTC
I'm not entirely sure what is causing the panic, but it is coming from a library that applies deployments for the operator.

This is happening in the 4.4 version of the code and is already fixed (mitigated?) in 4.5 as we replaced the code that's causing panic with a different implementation [1].

To resolve this, we could backport this fix to 4.4 but I'm uncertain if it's worth the effort at this stage in the 4.4 cycle. I estimated this as medium severity which would suggest possibly a no fix, but if you feel this should be higher I'm open to changing that.

@Ben Parees, would you prefer we increase severity and backport the library substitution? 

[1]: https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4

Comment 2 Ben Parees 2020-11-10 20:45:36 UTC
I guess i'd say it depends on the risk and difficulty of the backport.  It's definitely happening enough that i think it's worth fixing just to clean up our CI, if the risk+effort are not substantial.

Comment 4 Joel Speed 2020-11-11 11:46:14 UTC
To QE: For 4.5.z I believe this issue was resolved in https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4 which was part of https://github.com/openshift/machine-api-operator/pull/536

Since that has already merged and been through QE, I'm moving this BZ to ON_QA so that you can verify that it does fix the issue (no panics from the CI)

Comment 5 Ben Parees 2020-11-11 14:40:05 UTC
> Go via the proper route and create a chain of bugs from 4.7 through to 4.5 which are no-ops and get these verified, not sure how this would work.

that's the proper approach, QE can verify the BZs by using ci-search results to confirm that this particular panic is not being seen in CI jobs for those other releases.

Comment 6 sunzhaohua 2020-11-13 03:13:38 UTC
Verified
from the ci-search results we can confirm that this panic is not being seen in CI jobs for 4.5
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod

Comment 9 errata-xmlrpc 2020-11-24 12:42:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.5.20 bug fix and golang security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5118