Bug 1873353 - The deleted PrometheusRule is still in the thanos-ruler UI
Summary: The deleted PrometheusRule is still in the thanos-ruler UI
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Sergiusz Urbaniak
QA Contact: hongyan li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-28 02:19 UTC by hongyan li
Modified: 2020-10-27 16:35 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:35:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
alert rule on Monitoring-Alerting Rules UI (70.44 KB, image/png)
2020-08-28 02:19 UTC, hongyan li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 920 0 None closed Bug 1873353: bump Thanos to v0.15.0 2020-09-23 09:34:56 UTC
Github openshift thanos pull 35 0 None closed Bug 1873353: bump Thanos to v0.15.0-rc.1 2020-09-23 09:34:58 UTC
Github thanos-io thanos pull 3095 0 None closed Rule: update manager when all rule files are removed 2020-09-23 09:34:56 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:35:51 UTC

Description hongyan li 2020-08-28 02:19:23 UTC
Created attachment 1712904 [details]
alert rule on Monitoring-Alerting Rules UI

Created attachment 1712904 [details]
alert rule on Monitoring-Alerting Rules UI


still see 
The deleted PrometheusRule in monitoring-alertrules
The alerts in monitoring-alerts
the deleted PrometheusRule in thanos-ruler UI


Description of problem:
The deleted PrometheusRule is still in the thanos-ruler UI

Steps: enable UserWorkload and do the followings to create PrometheusRule
***********************************************
# oc new-project test3
# oc create -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: test3.rules
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: Watchdog
      expr: vector(1)
      labels:
        severity: none
      message:
        This is an alert meant to ensure that the entire alerting pipeline is functional.
EOF
***********************************************
The PrometheusRule is in rules-configmap-reloader container
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
groups:
- name: alerting rules
  rules:
  - alert: Watchdog
    expr: vector(1)
    labels:
      namespace: test3
      severity: none

# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- ls -al /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
lrwxrwxrwx. 1 root 1000420000 29 Apr  2 12:04 /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml -> ..data/test3-test3.rules.yaml

delete PrometheusRule in test3
# oc delete project test3
project.project.openshift.io "test3" deleted

# oc -n test3 get PrometheusRule
No resources found in test3 namespace.

# oc -n openshift-user-workload-monitoring logs -c rules-configmap-reloader thanos-ruler-user-workload-0 
2020/08/28 01:33:11 Watching directory: "/etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0"
2020/08/28 01:33:13 config map updated
2020/08/28 01:33:13 successfully triggered reload
2020/08/28 01:34:35 config map updated
2020/08/28 01:34:35 successfully triggered reload
2020/08/28 01:45:50 config map updated
2020/08/28 01:45:50 successfully triggered reload
2020/08/28 01:53:53 config map updated
2020/08/28 01:53:53 successfully triggered reload


The PrometheusRule is also removed from rules-configmap-reloader container
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
cat: /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml: No such file or directory
command terminated with exit code 1

Check after more than 20 minutes,the related alert and alert rules still can be found in UI, see the picture.

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-user-workload-monitoring exec -c thanos-ruler thanos-ruler-user-workload-1 -- curl -k -H "Authorization: Bearer $token" https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/rules | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   741  100   741    0     0  15052      0 --:--:-- --:--:-- --:--:-- 15122
{
  "status": "success",
  "data": {
    "groups": [
      {
        "name": "alerting rules",
        "file": "",
        "rules": [
          {
            "state": "firing",
            "name": "Watchdog",
            "query": "vector(1)",
            "duration": 0,
            "labels": {
              "namespace": "test3",
              "severity": "none",
              "thanos_ruler_replica": "thanos-ruler-user-workload-0"
            },
            "annotations": {},
            "alerts": [
              {
                "labels": {
                  "alertname": "Watchdog",
                  "namespace": "test3",
                  "severity": "none"
                },
                "annotations": {},
                "state": "firing",
                "activeAt": "2020-08-28T01:45:57.078951644Z",
                "value": "1e+00",
                "partialResponseStrategy": "ABORT"
              }
            ],
            "health": "ok",
            "evaluationTime": 0.0018852,
            "lastEvaluation": "2020-08-28T02:09:12.079090599Z",
            "type": "alerting"
          }
        ],
        "interval": 15,
        "evaluationTime": 0,
        "lastEvaluation": "0001-01-01T00:00:00Z",
        "partial_response_strategy": "WARN",
        "partialResponseStrategy": "ABORT"
      }
    ]
  }
}

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-27-005538

How reproducible:
Always

Steps to Reproduce:
1. See the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 hongyan li 2020-08-28 02:25:12 UTC
Similar bug for 4.5
https://bugzilla.redhat.com/show_bug.cgi?id=1820180

Comment 2 hongyan li 2020-08-28 02:28:09 UTC
Similar bug for 4.5
https://bugzilla.redhat.com/show_bug.cgi?id=1820180

Comment 3 hongyan li 2020-08-28 03:21:18 UTC
The alert rule and alerts always display on UI until a new Prometheus rule with different namespace is created

Comment 4 hongyan li 2020-08-28 03:22:07 UTC
The alert rule and alerts always display on UI after deleted until a new Prometheus rule with different namespace is created

Comment 5 Simon Pasquier 2020-08-28 09:44:09 UTC
I can reproduce with the upstream version of Thanos. I'll file an issue + PR there.

Comment 7 Simon Pasquier 2020-08-31 08:01:47 UTC
The upstream PR has been merged, we need to wait for the bump of Thanos in our downstream fork.

Comment 10 hongyan li 2020-09-10 04:16:39 UTC
Test with payload 4.6.0-0.nightly-2020-09-09-173545
The deleted PrometheusRule will disappear in less than a minute

Comment 12 errata-xmlrpc 2020-10-27 16:35:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.