Bug 1932618 - Alerts during a test run should fail the test job, but were not
Summary: Alerts during a test run should fail the test job, but were not
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Beth White
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks: 1932619 1932620
TreeView+ depends on / blocked
 
Reported: 2021-02-24 19:11 UTC by Clayton Coleman
Modified: 2021-07-27 22:48 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1932619 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:48:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 25923 0 None open Bug 1932618: Don't allow alerts to fire during a test run 2021-02-24 19:11:52 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:48:34 UTC

Description Clayton Coleman 2021-02-24 19:11:29 UTC
https://github.com/openshift/cluster-baremetal-operator/pull/110 merged containing a failing alert ClusterOperatorBaremetalDown and TargetDown.

This was the passing run https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/110/pull-ci-openshift-cluster-baremetal-operator-master-e2e-agnostic/1364398641615212544

The query in the alert test "shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured" is subtly wrong.

For 4.6, 4.7, and 4.8, we can remove the broken filter clause because both KubeAPILatencyHigh and KubePodCrashLooping on kcm namespace were fixed. In the future we must use "unless X" instead of joining with a "-" because of the way the series match.

After this fix we will be correctly enforcing "no alerts may fire during a CI test run".

Comment 2 hongyan li 2021-03-02 02:32:20 UTC
Test case passes and it ensures that cluster fire no other alerts except Watchdog and AlertmanagerReceiversNotConfigured, KubeAPILatencyHigh and KubePodCrashLooping will not be ignored.
alert test "shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured"
In the run on Feb 25
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/106/pull-ci-openshift-cluster-baremetal-operator-master-e2e-agnostic/1364987684044410880

Comment 5 errata-xmlrpc 2021-07-27 22:48:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.