Bug 1926598 - Duplicate alert rules are displayed on console for thanos-querier api return wrong results
Summary: Duplicate alert rules are displayed on console for thanos-querier api return ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.0
Assignee: Sergiusz Urbaniak
QA Contact: hongyan li
URL:
Whiteboard:
: 1940882 (view as bug list)
Depends On:
Blocks: 1944575
TreeView+ depends on / blocked
 
Reported: 2021-02-09 07:45 UTC by hongyan li
Modified: 2021-07-27 22:43 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:42:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screen shot for console with duplicate alert rule (239.68 KB, image/png)
2021-02-09 07:49 UTC, hongyan li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift thanos pull 51 0 None closed Bug 1926598: pkg/rules: fix deduplication of equal alerts with different labels 2021-03-30 13:54:29 UTC
Github thanos-io thanos pull 3960 0 None closed pkg/rules: fix deduplication of equal alerts with different labels 2021-03-30 13:54:32 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:43:31 UTC

Description hongyan li 2021-02-09 07:45:15 UTC
Description of problem:
Duplicate alert rules display on console

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-02-08-191932

How reproducible:
always

Steps to Reproduce:
1.Open console, click Monitoring->Alerting
2.click Alerting rules tab
3.Duplicate alert rules display, such as
ElasticsearchClusterNotHealthy display 4 items, actual 2
ElasticsearchNodeDiskWatermarkReached display 6 items, actual 3
etcdHighFsyncDurations display 4 items, actual 2


Actual results:

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[].name' | sort|grep ElasticsearchClusterNotHealthy
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  191k    0  191k    0     0  5317k      0 --:--:-- --:--:-- --:--:-- 5317k
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"

#oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[].name' | sort|grep ElasticsearchClusterNotHealthy
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  251k    0  251k    0     0  3110k      0 --:--:-- --:--:-- --:--:-- 3110k
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"


Expected results:
The api return same results for platform alert rules

Additional info:
didn't enable user-workload monitoring

Comment 1 hongyan li 2021-02-09 07:49:48 UTC
Created attachment 1755860 [details]
screen shot for console with duplicate alert rule

Comment 2 hongyan li 2021-02-09 08:08:50 UTC
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq |grep ElasticsearchClusterNotHealthy -A20
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  251k    0  251k    0     0  4344k      0 --:--:-- --:--:-- --:--:-- 4344k
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 2)",
            "duration": 420,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "critical"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been RED for at least 7m. Cluster does not accept writes, shards may be missing or master node hasn't been elected yet. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Health-is-Red",
              "summary": "Cluster health status is RED"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000253482,
            "lastEvaluation": "2021-02-09T08:06:42.681095906Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 1)",
            "duration": 1200,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "warning"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Some shard replicas are not allocated. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Healthy-is-Yellow",
              "summary": "Cluster health status is YELLOW"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000109183,
            "lastEvaluation": "2021-02-09T08:06:42.68135068Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 2)",
            "duration": 420,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "critical"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been RED for at least 7m. Cluster does not accept writes, shards may be missing or master node hasn't been elected yet. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Health-is-Red",
              "summary": "Cluster health status is RED"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000212364,
            "lastEvaluation": "2021-02-09T08:06:42.681140147Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 1)",
            "duration": 1200,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "warning"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Some shard replicas are not allocated. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Healthy-is-Yellow",
              "summary": "Cluster health status is YELLOW"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 8.7608e-05,
            "lastEvaluation": "2021-02-09T08:06:42.681353788Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchDiskSpaceRunningLow",
            "query": "sum(predict_linear(es_fs_path_available_bytes[6h], 6 * 3600)) < 0",

Comment 3 hongyan li 2021-03-23 09:22:11 UTC
*** Bug 1940882 has been marked as a duplicate of this bug. ***

Comment 9 Junqi Zhao 2021-03-31 03:44:48 UTC
tested with 4.8.0-0.nightly-2021-03-30-160509, search alerting rules in "Monitoring-> Alerting -> Alerting rules", no duplicate alert rules are displayed now

Comment 12 errata-xmlrpc 2021-07-27 22:42:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.