1978338 – "Prometheus metrics should be available after an upgrade" is panicking

Bug 1978338 - "Prometheus metrics should be available after an upgrade" is panicking

Summary: "Prometheus metrics should be available after an upgrade" is panicking

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Filip Petkovski
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-01 15:20 UTC by Stephen Benjamin
Modified:	2021-10-18 17:37 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:	[sig-instrumentation] Prometheus metrics should be available after an upgrade job=periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-metal-ipi-upgrade=all
Last Closed:	2021-10-18 17:37:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 26299	0	None	open	Bug 1978338: Skip prometheus upgrade test	2021-07-05 06:57:22 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:37:51 UTC

Description Stephen Benjamin 2021-07-01 15:20:42 UTC

test:
[sig-instrumentation] Prometheus metrics should be available after an upgrade 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-instrumentation%5C%5D+Prometheus+metrics+should+be+available+after+an+upgrade

This test is panicking on metal-ipi upgrades: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-metal-ipi-upgrade/1410386679042150400

However there's no trace to know exactly why, Ginkgo offers a suggestion to get a traceback.

Test logs:

Your test failed.
Ginkgo panics to prevent subsequent assertions from running.
Normally Ginkgo rescues this panic so you shouldn't see it.

But, if you make an assertion in a goroutine, Ginkgo can't capture the panic.
To circumvent this, you should call

	defer GinkgoRecover()

at the top of the goroutine that caused this panic.

Jul  1 01:55:55.662: INFO: "[sig-instrumentation] Prometheus metrics should be available after an upgrade": panic: 
Your test failed.
Ginkgo panics to prevent subsequent assertions from running.
Normally Ginkgo rescues this panic so you shouldn't see it.

But, if you make an assertion in a goroutine, Ginkgo can't capture the panic.
To circumvent this, you should call

	defer GinkgoRecover()

Comment 1 Stephen Benjamin 2021-07-01 15:21:46 UTC

Could you have a look at this? Thanks!

Comment 2 Filip Petkovski 2021-07-01 15:35:04 UTC

Taking a look

Comment 3 Filip Petkovski 2021-07-01 18:09:02 UTC

Looking at the CI logs, the test is only failing on metal-ipi upgrades. This test suite does not run on openshift/origin PRs and cannot be run through cluster bot either. Is there any other way to troubleshoot the problem? Otherwise we might need to completely disable the test.

Comment 5 Junqi Zhao 2021-07-20 07:52:14 UTC

LGTM, the case will be skipped in baremetal cluster if it don't have storage class

Comment 12 errata-xmlrpc 2021-10-18 17:37:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.