test: Undiagnosed panic detected in pod specifically it's failing in machine-api-operator (not sure if i got the right BZ component for that operator): pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator.log.gz:E1105 00:25:23.957881 1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3) pods/openshift-machine-api_machine-api-operator-67478d85ff-92kzq_machine-api-operator_previous.log.gz:E1105 00:25:23.957881 1 runtime.go:78] Observed a panic: runtime.boundsError{x:3, y:3, signed:true, code:0x0} (runtime error: index out of range [3] with length 3) is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod Sample job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.4-to-4.5/1324097015897919488
I'm not entirely sure what is causing the panic, but it is coming from a library that applies deployments for the operator. This is happening in the 4.4 version of the code and is already fixed (mitigated?) in 4.5 as we replaced the code that's causing panic with a different implementation [1]. To resolve this, we could backport this fix to 4.4 but I'm uncertain if it's worth the effort at this stage in the 4.4 cycle. I estimated this as medium severity which would suggest possibly a no fix, but if you feel this should be higher I'm open to changing that. @Ben Parees, would you prefer we increase severity and backport the library substitution? [1]: https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4
I guess i'd say it depends on the risk and difficulty of the backport. It's definitely happening enough that i think it's worth fixing just to clean up our CI, if the risk+effort are not substantial.
To QE: For 4.5.z I believe this issue was resolved in https://github.com/openshift/machine-api-operator/commit/69986ee5dc8737349d3da9dbf33c6a7138be6ea4 which was part of https://github.com/openshift/machine-api-operator/pull/536 Since that has already merged and been through QE, I'm moving this BZ to ON_QA so that you can verify that it does fix the issue (no panics from the CI)
> Go via the proper route and create a chain of bugs from 4.7 through to 4.5 which are no-ops and get these verified, not sure how this would work. that's the proper approach, QE can verify the BZs by using ci-search results to confirm that this particular panic is not being seen in CI jobs for those other releases.
Verified from the ci-search results we can confirm that this panic is not being seen in CI jobs for 4.5 https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Undiagnosed+panic+detected+in+pod
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.5.20 bug fix and golang security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5118