2040277 – ThanosRuleNoEvaluationFor10Intervals alert description is wrong

Bug 2040277 - ThanosRuleNoEvaluationFor10Intervals alert description is wrong

Summary: ThanosRuleNoEvaluationFor10Intervals alert description is wrong

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Simon Pasquier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-13 10:58 UTC by Junqi Zhao
Modified:	2022-08-10 10:42 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 10:42:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ThanosRuleNoEvaluationFor10Intervals alert expr result in prometheus (104.75 KB, image/png) 2022-01-13 10:58 UTC, Junqi Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	thanos-io thanos pull 5105	0	None	Merged	fix: changing description to not include rule groups	2022-03-01 09:47:31 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 10:42:38 UTC

Description Junqi Zhao 2022-01-13 10:58:32 UTC

Created attachment 1850565 [details]
ThanosRuleNoEvaluationFor10Intervals alert expr result in prometheus

Description of problem:
came across ThanosRuleNoEvaluationFor10Intervals alert, the description is
"Thanos Rule thanos-ruler in openshift-user-workload-monitoring has 1.6G% rule groups that did not evaluate for at least 10x of their expected interval."
checked the alert detail:
1. {{$value | humanize}}% should be {{$value | humanize}}
2. the expr seems weird, not sure if is right

        - alert: ThanosRuleNoEvaluationFor10Intervals
          annotations:
            description: Thanos Rule {{$labels.job}} in {{$labels.namespace}} has {{$value
              | humanize}}% rule groups that did not evaluate for at least 10x of their
              expected interval.
            summary: Thanos Rule has rule groups that did not evaluate for 10 intervals.
          expr: |
            time() -  max by (namespace, job, instance, group) (prometheus_rule_group_last_evaluation_timestamp_seconds{job="thanos-ruler"})
            >
            10 * max by (namespace, job, instance, group) (prometheus_rule_group_interval_seconds{job="thanos-ruler"})
          for: 5m
          labels:
            severity: info

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-11-065245

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 hongyan li 2022-01-13 11:20:21 UTC

I feel the alert rule has no issue, this is not a bug.

Comment 2 hongyan li 2022-01-13 11:33:51 UTC

The description of the alert has issue

Comment 3 Prashant Balachandran 2022-01-27 11:34:36 UTC

Created upstream PR https://github.com/thanos-io/thanos/pull/5105

Comment 4 Prashant Balachandran 2022-03-01 11:42:06 UTC

Changes have been pulled into CMO with this PR
https://github.com/openshift/cluster-monitoring-operator/pull/1556/

Comment 7 Junqi Zhao 2022-06-06 03:09:27 UTC

tested with 4.11.0-0.nightly-2022-06-04-014713, ThanosRuleNoEvaluationFor10Intervals definition is updated to:
        - alert: ThanosRuleNoEvaluationFor10Intervals
          annotations:
            description: Thanos Rule {{$labels.job}} in {{$labels.namespace}} has rule groups
              that did not evaluate for at least 10x of their expected interval.
            summary: Thanos Rule has rule groups that did not evaluate for 10 intervals.
          expr: |
            time() -  max by (namespace, job, instance, group) (prometheus_rule_group_last_evaluation_timestamp_seconds{job="thanos-ruler"})
            >
            10 * max by (namespace, job, instance, group) (prometheus_rule_group_interval_seconds{job="thanos-ruler"})
          for: 5m
          labels:
            severity: info

Comment 11 errata-xmlrpc 2022-08-10 10:42:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.