1873353 – The deleted PrometheusRule is still in the thanos-ruler UI

Bug 1873353 - The deleted PrometheusRule is still in the thanos-ruler UI

Summary: The deleted PrometheusRule is still in the thanos-ruler UI

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Sergiusz Urbaniak
QA Contact:	hongyan li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-28 02:19 UTC by hongyan li
Modified:	2020-10-27 16:35 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:35:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
alert rule on Monitoring-Alerting Rules UI (70.44 KB, image/png) 2020-08-28 02:19 UTC, hongyan li	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 920	None	closed	Bug 1873353: bump Thanos to v0.15.0	2020-09-23 09:34:56 UTC
Github	openshift thanos pull 35	None	closed	Bug 1873353: bump Thanos to v0.15.0-rc.1	2020-09-23 09:34:58 UTC
Github	thanos-io thanos pull 3095	None	closed	Rule: update manager when all rule files are removed	2020-09-23 09:34:56 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:35:51 UTC

Description hongyan li 2020-08-28 02:19:23 UTC

Created attachment 1712904 [details]
alert rule on Monitoring-Alerting Rules UI

Created attachment 1712904 [details]
alert rule on Monitoring-Alerting Rules UI


still see 
The deleted PrometheusRule in monitoring-alertrules
The alerts in monitoring-alerts
the deleted PrometheusRule in thanos-ruler UI


Description of problem:
The deleted PrometheusRule is still in the thanos-ruler UI

Steps: enable UserWorkload and do the followings to create PrometheusRule
***********************************************
# oc new-project test3
# oc create -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: test3.rules
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: Watchdog
      expr: vector(1)
      labels:
        severity: none
      message:
        This is an alert meant to ensure that the entire alerting pipeline is functional.
EOF
***********************************************
The PrometheusRule is in rules-configmap-reloader container
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
groups:
- name: alerting rules
  rules:
  - alert: Watchdog
    expr: vector(1)
    labels:
      namespace: test3
      severity: none

# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- ls -al /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
lrwxrwxrwx. 1 root 1000420000 29 Apr  2 12:04 /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml -> ..data/test3-test3.rules.yaml

delete PrometheusRule in test3
# oc delete project test3
project.project.openshift.io "test3" deleted

# oc -n test3 get PrometheusRule
No resources found in test3 namespace.

# oc -n openshift-user-workload-monitoring logs -c rules-configmap-reloader thanos-ruler-user-workload-0 
2020/08/28 01:33:11 Watching directory: "/etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0"
2020/08/28 01:33:13 config map updated
2020/08/28 01:33:13 successfully triggered reload
2020/08/28 01:34:35 config map updated
2020/08/28 01:34:35 successfully triggered reload
2020/08/28 01:45:50 config map updated
2020/08/28 01:45:50 successfully triggered reload
2020/08/28 01:53:53 config map updated
2020/08/28 01:53:53 successfully triggered reload


The PrometheusRule is also removed from rules-configmap-reloader container
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
cat: /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml: No such file or directory
command terminated with exit code 1

Check after more than 20 minutes,the related alert and alert rules still can be found in UI, see the picture.

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-user-workload-monitoring exec -c thanos-ruler thanos-ruler-user-workload-1 -- curl -k -H "Authorization: Bearer $token" https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/rules | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   741  100   741    0     0  15052      0 --:--:-- --:--:-- --:--:-- 15122
{
  "status": "success",
  "data": {
    "groups": [
      {
        "name": "alerting rules",
        "file": "",
        "rules": [
          {
            "state": "firing",
            "name": "Watchdog",
            "query": "vector(1)",
            "duration": 0,
            "labels": {
              "namespace": "test3",
              "severity": "none",
              "thanos_ruler_replica": "thanos-ruler-user-workload-0"
            },
            "annotations": {},
            "alerts": [
              {
                "labels": {
                  "alertname": "Watchdog",
                  "namespace": "test3",
                  "severity": "none"
                },
                "annotations": {},
                "state": "firing",
                "activeAt": "2020-08-28T01:45:57.078951644Z",
                "value": "1e+00",
                "partialResponseStrategy": "ABORT"
              }
            ],
            "health": "ok",
            "evaluationTime": 0.0018852,
            "lastEvaluation": "2020-08-28T02:09:12.079090599Z",
            "type": "alerting"
          }
        ],
        "interval": 15,
        "evaluationTime": 0,
        "lastEvaluation": "0001-01-01T00:00:00Z",
        "partial_response_strategy": "WARN",
        "partialResponseStrategy": "ABORT"
      }
    ]
  }
}

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-27-005538

How reproducible:
Always

Steps to Reproduce:
1. See the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 hongyan li 2020-08-28 02:25:12 UTC

Similar bug for 4.5
https://bugzilla.redhat.com/show_bug.cgi?id=1820180

Comment 2 hongyan li 2020-08-28 02:28:09 UTC

Similar bug for 4.5
https://bugzilla.redhat.com/show_bug.cgi?id=1820180

Comment 3 hongyan li 2020-08-28 03:21:18 UTC

The alert rule and alerts always display on UI until a new Prometheus rule with different namespace is created

Comment 4 hongyan li 2020-08-28 03:22:07 UTC

The alert rule and alerts always display on UI after deleted until a new Prometheus rule with different namespace is created

Comment 5 Simon Pasquier 2020-08-28 09:44:09 UTC

I can reproduce with the upstream version of Thanos. I'll file an issue + PR there.

Comment 7 Simon Pasquier 2020-08-31 08:01:47 UTC

The upstream PR has been merged, we need to wait for the bump of Thanos in our downstream fork.

Comment 10 hongyan li 2020-09-10 04:16:39 UTC

Test with payload 4.6.0-0.nightly-2020-09-09-173545
The deleted PrometheusRule will disappear in less than a minute

Comment 12 errata-xmlrpc 2020-10-27 16:35:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.