Bug 1932618

Summary: Alerts during a test run should fail the test job, but were not
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Bare Metal Hardware ProvisioningAssignee: Beth White <beth.white>
Bare Metal Hardware Provisioning sub component: cluster-baremetal-operator QA Contact: Amit Ugol <augol>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: alegrand, anpicker, aos-bugs, bfournie, erooth, hongyli, kakkoyun, lcosic, pkrupa, surbania, wking
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1932619 (view as bug list) Environment:
Last Closed: 2021-07-27 22:48:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1932619, 1932620    

Description Clayton Coleman 2021-02-24 19:11:29 UTC
https://github.com/openshift/cluster-baremetal-operator/pull/110 merged containing a failing alert ClusterOperatorBaremetalDown and TargetDown.

This was the passing run https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/110/pull-ci-openshift-cluster-baremetal-operator-master-e2e-agnostic/1364398641615212544

The query in the alert test "shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured" is subtly wrong.

For 4.6, 4.7, and 4.8, we can remove the broken filter clause because both KubeAPILatencyHigh and KubePodCrashLooping on kcm namespace were fixed. In the future we must use "unless X" instead of joining with a "-" because of the way the series match.

After this fix we will be correctly enforcing "no alerts may fire during a CI test run".

Comment 2 hongyan li 2021-03-02 02:32:20 UTC
Test case passes and it ensures that cluster fire no other alerts except Watchdog and AlertmanagerReceiversNotConfigured, KubeAPILatencyHigh and KubePodCrashLooping will not be ignored.
alert test "shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured"
In the run on Feb 25
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/106/pull-ci-openshift-cluster-baremetal-operator-master-e2e-agnostic/1364987684044410880

Comment 5 errata-xmlrpc 2021-07-27 22:48:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438