Bug 1978338 - "Prometheus metrics should be available after an upgrade" is panicking
Summary: "Prometheus metrics should be available after an upgrade" is panicking
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.9.0
Assignee: Filip Petkovski
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-01 15:20 UTC by Stephen Benjamin
Modified: 2021-10-18 17:37 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
[sig-instrumentation] Prometheus metrics should be available after an upgrade job=periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-metal-ipi-upgrade=all
Last Closed: 2021-10-18 17:37:30 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26299 0 None open Bug 1978338: Skip prometheus upgrade test 2021-07-05 06:57:22 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:37:51 UTC

Description Stephen Benjamin 2021-07-01 15:20:42 UTC
test:
[sig-instrumentation] Prometheus metrics should be available after an upgrade 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-instrumentation%5C%5D+Prometheus+metrics+should+be+available+after+an+upgrade

This test is panicking on metal-ipi upgrades: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-metal-ipi-upgrade/1410386679042150400

However there's no trace to know exactly why, Ginkgo offers a suggestion to get a traceback.

Test logs:

Your test failed.
Ginkgo panics to prevent subsequent assertions from running.
Normally Ginkgo rescues this panic so you shouldn't see it.

But, if you make an assertion in a goroutine, Ginkgo can't capture the panic.
To circumvent this, you should call

	defer GinkgoRecover()

at the top of the goroutine that caused this panic.

Jul  1 01:55:55.662: INFO: "[sig-instrumentation] Prometheus metrics should be available after an upgrade": panic: 
Your test failed.
Ginkgo panics to prevent subsequent assertions from running.
Normally Ginkgo rescues this panic so you shouldn't see it.

But, if you make an assertion in a goroutine, Ginkgo can't capture the panic.
To circumvent this, you should call

	defer GinkgoRecover()

Comment 1 Stephen Benjamin 2021-07-01 15:21:46 UTC
Could you have a look at this? Thanks!

Comment 2 Filip Petkovski 2021-07-01 15:35:04 UTC
Taking a look

Comment 3 Filip Petkovski 2021-07-01 18:09:02 UTC
Looking at the CI logs, the test is only failing on metal-ipi upgrades. This test suite does not run on openshift/origin PRs and cannot be run through cluster bot either. Is there any other way to troubleshoot the problem? Otherwise we might need to completely disable the test.

Comment 5 Junqi Zhao 2021-07-20 07:52:14 UTC
LGTM, the case will be skipped in baremetal cluster if it don't have storage class

Comment 12 errata-xmlrpc 2021-10-18 17:37:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.