Bug 1866469

Summary: PrometheusOperatorListErrors fires for too long
Product: OpenShift Container Platform Reporter: Lili Cosic <lcosic>
Component: MonitoringAssignee: Lili Cosic <lcosic>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:25:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lili Cosic 2020-08-05 16:03:14 UTC
Description of problem:
PrometheusOperatorListErrors fires despite list errors gone. We should reduce the range to be less than the for. 

Version-Release number of selected component (if applicable):
4.6+

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Junqi Zhao 2020-08-13 10:22:15 UTC
Tested with 4.6.0-0.nightly-2020-08-12-155346, range is 10m for PrometheusOperatorListErrors/PrometheusOperatorWatchErrors
      - alert: PrometheusOperatorListErrors
        annotations:
          message: Errors while performing List operations in controller {{$labels.controller}} in {{$labels.namespace}} namespace.
        expr: |
          (sum by (controller,namespace) (rate(prometheus_operator_list_operations_failed_total{job="prometheus-operator",namespace="openshift-monitoring"}[10m])) / sum by (controller,namespace) (rate(prometheus_operator_list_operations_total{job="prometheus-operator",namespace="openshift-monitoring"}[10m]))) > 0.4
        for: 15m
        labels:
          severity: warning
      - alert: PrometheusOperatorWatchErrors
        annotations:
          message: Errors while performing Watch operations in controller {{$labels.controller}} in {{$labels.namespace}} namespace.
        expr: |
          (sum by (controller,namespace) (rate(prometheus_operator_watch_operations_failed_total{job="prometheus-operator",namespace="openshift-monitoring"}[10m])) / sum by (controller,namespace) (rate(prometheus_operator_watch_operations_total{job="prometheus-operator",namespace="openshift-monitoring"}[10m]))) > 0.4
        for: 15m
        labels:
          severity: warning

Comment 5 errata-xmlrpc 2020-10-27 16:25:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196