Bug 1926598

Summary: Duplicate alert rules are displayed on console for thanos-querier api return wrong results
Product: OpenShift Container Platform Reporter: hongyan li <hongyli>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED ERRATA QA Contact: hongyan li <hongyli>
Severity: medium Docs Contact:
Priority: high    
Version: 4.7CC: alegrand, anpicker, erooth, juzhao, kakkoyun, lcosic, pkrupa, spasquie
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:42:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1944575    
Attachments:
Description Flags
screen shot for console with duplicate alert rule none

Description hongyan li 2021-02-09 07:45:15 UTC
Description of problem:
Duplicate alert rules display on console

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-02-08-191932

How reproducible:
always

Steps to Reproduce:
1.Open console, click Monitoring->Alerting
2.click Alerting rules tab
3.Duplicate alert rules display, such as
ElasticsearchClusterNotHealthy display 4 items, actual 2
ElasticsearchNodeDiskWatermarkReached display 6 items, actual 3
etcdHighFsyncDurations display 4 items, actual 2


Actual results:

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[].name' | sort|grep ElasticsearchClusterNotHealthy
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  191k    0  191k    0     0  5317k      0 --:--:-- --:--:-- --:--:-- 5317k
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"

#oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq '.data.groups[].rules[].name' | sort|grep ElasticsearchClusterNotHealthy
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  251k    0  251k    0     0  3110k      0 --:--:-- --:--:-- --:--:-- 3110k
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"
"ElasticsearchClusterNotHealthy"


Expected results:
The api return same results for platform alert rules

Additional info:
didn't enable user-workload monitoring

Comment 1 hongyan li 2021-02-09 07:49:48 UTC
Created attachment 1755860 [details]
screen shot for console with duplicate alert rule

Comment 2 hongyan li 2021-02-09 08:08:50 UTC
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules' | jq |grep ElasticsearchClusterNotHealthy -A20
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  251k    0  251k    0     0  4344k      0 --:--:-- --:--:-- --:--:-- 4344k
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 2)",
            "duration": 420,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "critical"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been RED for at least 7m. Cluster does not accept writes, shards may be missing or master node hasn't been elected yet. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Health-is-Red",
              "summary": "Cluster health status is RED"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000253482,
            "lastEvaluation": "2021-02-09T08:06:42.681095906Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 1)",
            "duration": 1200,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "warning"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Some shard replicas are not allocated. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Healthy-is-Yellow",
              "summary": "Cluster health status is YELLOW"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000109183,
            "lastEvaluation": "2021-02-09T08:06:42.68135068Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 2)",
            "duration": 420,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "critical"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been RED for at least 7m. Cluster does not accept writes, shards may be missing or master node hasn't been elected yet. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Health-is-Red",
              "summary": "Cluster health status is RED"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 0.000212364,
            "lastEvaluation": "2021-02-09T08:06:42.681140147Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchClusterNotHealthy",
            "query": "sum by(cluster) (es_cluster_status == 1)",
            "duration": 1200,
            "labels": {
              "prometheus": "openshift-monitoring/k8s",
              "severity": "warning"
            },
            "annotations": {
              "message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Some shard replicas are not allocated. For more information refer to https://github.com/openshift/elasticsearch-operator/blob/master/docs/alerts.md#Elasticsearch-Cluster-Healthy-is-Yellow",
              "summary": "Cluster health status is YELLOW"
            },
            "alerts": [],
            "health": "ok",
            "evaluationTime": 8.7608e-05,
            "lastEvaluation": "2021-02-09T08:06:42.681353788Z",
            "type": "alerting"
          },
          {
            "state": "inactive",
            "name": "ElasticsearchDiskSpaceRunningLow",
            "query": "sum(predict_linear(es_fs_path_available_bytes[6h], 6 * 3600)) < 0",

Comment 3 hongyan li 2021-03-23 09:22:11 UTC
*** Bug 1940882 has been marked as a duplicate of this bug. ***

Comment 9 Junqi Zhao 2021-03-31 03:44:48 UTC
tested with 4.8.0-0.nightly-2021-03-30-160509, search alerting rules in "Monitoring-> Alerting -> Alerting rules", no duplicate alert rules are displayed now

Comment 12 errata-xmlrpc 2021-07-27 22:42:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438