Bug 1872874

Summary: [sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early]
Product: OpenShift Container Platform Reporter: Douglas Smith <dosmith>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: alegrand, anpicker, aos-bugs, erooth, jchaloup, kakkoyun, lcosic, mfojtik, mloibl, pkrupa, surbania, surya, vpickard, xxia
Version: 4.6Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early]
Last Closed: 2020-10-12 14:51:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Douglas Smith 2020-08-26 19:08:43 UTC
test:
[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-instrumentation%5C%5D+Prometheus+when+installed+on+the+cluster+shouldn%27t+report+any+alerts+in+firing+state+apart+from+Watchdog+and+AlertmanagerReceiversNotConfigured+%5C%5BEarly%5C%5D


https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.5/1298666591172431872

-----

[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Suite:openshift/conformance/parallel] expand_less	1m38s
fail [github.com/openshift/origin/test/extended/util/prometheus/helpers.go:174]: Expected
    <map[string]error | len:1>: {
        "ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\",severity!=\"info\"} >= 1": {
            s: "promQL query: ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\",severity!=\"info\"} >= 1 had reported incorrect results:\n[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"KubeAPIDown\",\"alertstate\":\"firing\",\"severity\":\"critical\"},\"value\":[1598466340.548,\"1\"]},{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"KubeControllerManagerDown\",\"alertstate\":\"firing\",\"severity\":\"critical\"},\"value\":[1598466340.548,\"1\"]},{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"KubeSchedulerDown\",\"alertstate\":\"firing\",\"severity\":\"critical\"},\"value\":[1598466340.548,\"1\"]}]",
        },
    }
to be empty

Comment 1 Sergiusz Urbaniak 2020-08-27 07:43:08 UTC
KubeAPIDown, KubeControllerManagerDown indicate issues with the control plane, hence reassigning to kube-apiserver.

Comment 2 Stefan Schimanski 2020-08-27 10:37:56 UTC
This is not actionable. The query mixes many root causes already tracked elsewhere. Either give an analysis or point too some concrete issue (e.g. by platform, networking stack, component).

Comment 5 Mike Dame 2020-12-04 18:03:51 UTC
*** Bug 1891068 has been marked as a duplicate of this bug. ***