Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1924864

Summary: [sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation
Product: OpenShift Container Platform Reporter: Russell Teague <rteague>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.7CC: alegrand, anpicker, erooth, kakkoyun, lcosic, pkrupa, surbania
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation
Last Closed: 2021-02-04 09:55:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Russell Teague 2021-02-03 18:59:55 UTC
test:
[sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-instrumentation%5C%5D+Prometheus+when+installed+on+the+cluster+shouldn%27t+have+failing+rules+evaluation


Example job:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-workers-rhel7/1356852562925457408


Job snippet:
Feb  3 07:39:22.609: INFO: Running AfterSuite actions on node 1
fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:373]: Unexpected error:
    <errors.aggregate | len:1, cap:1>: [
        {
            s: "query failed: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1: promQL query: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1 had reported incorrect results:\n[{\"metric\":{\"container\":\"prometheus-proxy\",\"endpoint\":\"web\",\"instance\":\"10.130.2.14:9091\",\"job\":\"prometheus-k8s\",\"namespace\":\"openshift-monitoring\",\"pod\":\"prometheus-k8s-1\",\"rule_group\":\"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-monitoring-prometheus-k8s-rules.yaml;node.rules\",\"service\":\"prometheus-k8s\"},\"value\":[1612337951.305,\"2.154166666666667\"]}]",
        },
    ]
    query failed: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1: promQL query: increase(prometheus_rule_evaluation_failures_total[8m37s]) >= 1 had reported incorrect results:
    [{"metric":{"container":"prometheus-proxy","endpoint":"web","instance":"10.130.2.14:9091","job":"prometheus-k8s","namespace":"openshift-monitoring","pod":"prometheus-k8s-1","rule_group":"/etc/prometheus/rules/prometheus-k8s-rulefiles-0/openshift-monitoring-prometheus-k8s-rules.yaml;node.rules","service":"prometheus-k8s"},"value":[1612337951.305,"2.154166666666667"]}]
occurred