Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1924864

Summary:	[sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation
Product:	OpenShift Container Platform	Reporter:	Russell Teague <rteague>
Component:	Monitoring	Assignee:	Sergiusz Urbaniak <surbania>
Status:	CLOSED DUPLICATE	QA Contact:	Junqi Zhao <juzhao>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.7	CC:	alegrand, anpicker, erooth, kakkoyun, lcosic, pkrupa, surbania
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:	[sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation
Last Closed:	2021-02-04 09:55:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Russell Teague 2021-02-03 18:59:55 UTC

test:
[sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-instrumentation%5C%5D+Prometheus+when+installed+on+the+cluster+shouldn%27t+have+failing+rules+evaluation


Example job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-workers-rhel7/1356852562925457408


Job snippet:
Feb  3 07:39:22.609: INFO: Running AfterSuite actions on node 1
fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:373]: Unexpected error:
    <errors.aggregate | len:1, cap:1>: [
        {
            s: "query failed: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1: promQL query: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1 had reported incorrect results:\n[{\"metric\":{\"container\":\"prometheus-proxy\",\"endpoint\":\"web\",\"instance\":\"10.130.2.14:9091\",\"job\":\"prometheus-k8s\",\"namespace\":\"openshift-monitoring\",\"pod\":\"prometheus-k8s-1\",\"rule_group\":\"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-monitoring-prometheus-k8s-rules.yaml;node.rules\",\"service\":\"prometheus-k8s\"},\"value\":[1612337951.305,\"2.154166666666667\"]}]",
        },
    ]
    query failed: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1: promQL query: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1 had reported incorrect results:
    [{"metric":{"container":"prometheus-proxy","endpoint":"web","instance":"10.130.2.14:9091","job":"prometheus-k8s","namespace":"openshift-monitoring","pod":"prometheus-k8s-1","rule_group":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-monitoring-prometheus-k8s-rules.yaml;node.rules","service":"prometheus-k8s"},"value":[1612337951.305,"2.154166666666667"]}]
occurred

Comment 1 Sergiusz Urbaniak 2021-02-04 09:55:49 UTC

Thank you for the bug report. While looking at the underlying prometheus logs https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-workers-rhel7/1356852562925457408/artifacts/e2e-aws-workers-rhel7/gather-extra/artifacts/pods/openshift-monitoring_prometheus-k8s-1_prometheus.log this indicates it is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1908655

*** This bug has been marked as a duplicate of bug 1908655 ***