Bug 1926598

Summary:

Duplicate alert rules are displayed on console for thanos-querier api return wrong results

Product:

OpenShift Container Platform

Reporter:

hongyan li <hongyli>

Component:

Monitoring

Assignee:

Sergiusz Urbaniak <surbania>

Status:

CLOSED ERRATA

QA Contact:

hongyan li <hongyli>

Severity:

medium

Docs Contact:

Priority:

high

Version:

4.7

CC:

alegrand, anpicker, erooth, juzhao, kakkoyun, lcosic, pkrupa, spasquie

Target Milestone:

---

Target Release:

4.8.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-07-27 22:42:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1944575

Attachments:

Description	Flags
screen shot for console with duplicate alert rule	none

Description hongyan li 2021-02-09 07:45:15 UTC

Description of problem:
Duplicate alert rules display on console

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-02-08-191932

How reproducible:
always

Steps to Reproduce:
1.Open console, click Monitoring->Alerting
2.click Alerting rules tab
3.Duplicate alert rules display, such as
ElasticsearchClusterNotHealthy display 4 items, actual 2
ElasticsearchNodeDiskWatermarkReached display 6 items, actual 3
etcdHighFsyncDurations display 4 items, actual 2


Actual results:

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[].name' | sort|grep ElasticsearchClusterNotHealthy
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  191k    0  191k    0     0  5317k      0 --:--:-- --:--:-- --:--:-- 5317k
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"

#oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[].name' | sort|grep ElasticsearchClusterNotHealthy
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  251k    0  251k    0     0  3110k      0 --:--:-- --:--:-- --:--:-- 3110k
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"


Expected results:
The api return same results for platform alert rules

Additional info:
didn't enable user-workload monitoring

Comment 1 hongyan li 2021-02-09 07:49:48 UTC

Created attachment 1755860 [details]
screen shot for console with duplicate alert rule

Comment 2 hongyan li 2021-02-09 08:08:50 UTC

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq |grep ElasticsearchClusterNotHealthy -A20
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  251k    0  251k    0     0  4344k      0 --:--:-- --:--:-- --:--:-- 4344k
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 2)",
            "duration": 420,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "critical"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been RED for at least 7m. Cluster does not accept writes, shards may be missing or master node hasn't been elected yet. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Health-is-Red",
              "summary": "Cluster health status is RED"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000253482,
            "lastEvaluation": "2021-02-09T08:06:42.681095906Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 1)",
            "duration": 1200,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "warning"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Some shard replicas are not allocated. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Healthy-is-Yellow",
              "summary": "Cluster health status is YELLOW"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000109183,
            "lastEvaluation": "2021-02-09T08:06:42.68135068Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 2)",
            "duration": 420,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "critical"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been RED for at least 7m. Cluster does not accept writes, shards may be missing or master node hasn't been elected yet. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Health-is-Red",
              "summary": "Cluster health status is RED"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000212364,
            "lastEvaluation": "2021-02-09T08:06:42.681140147Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 1)",
            "duration": 1200,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "warning"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Some shard replicas are not allocated. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Healthy-is-Yellow",
              "summary": "Cluster health status is YELLOW"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 8.7608e-05,
            "lastEvaluation": "2021-02-09T08:06:42.681353788Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchDiskSpaceRunningLow",
            "query": "sum(predict_linear(es_fs_path_available_bytes[6h], 6 * 3600)) < 0",

Comment 3 hongyan li 2021-03-23 09:22:11 UTC

*** Bug 1940882 has been marked as a duplicate of this bug. ***

Comment 9 Junqi Zhao 2021-03-31 03:44:48 UTC

tested with 4.8.0-0.nightly-2021-03-30-160509, search alerting rules in "Monitoring-> Alerting -> Alerting rules", no duplicate alert rules are displayed now

Comment 12 errata-xmlrpc 2021-07-27 22:42:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438