Bug 1873353 - The deleted PrometheusRule is still in the thanos-ruler UI
Summary: The deleted PrometheusRule is still in the thanos-ruler UI
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Sergiusz Urbaniak
QA Contact: hongyan li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-28 02:19 UTC by hongyan li
Modified: 2020-09-10 04:16 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
alert rule on Monitoring-Alerting Rules UI (70.44 KB, image/png)
2020-08-28 02:19 UTC, hongyan li
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 920 None open Bug 1873353: bump Thanos to v0.15.0 2020-09-07 14:36:40 UTC
Github openshift thanos pull 35 None closed Bug 1873353: bump Thanos to v0.15.0-rc.1 2020-09-08 07:09:37 UTC
Github thanos-io thanos pull 3095 None closed Rule: update manager when all rule files are removed 2020-09-08 07:09:37 UTC

Description hongyan li 2020-08-28 02:19:23 UTC
Created attachment 1712904 [details]
alert rule on Monitoring-Alerting Rules UI

Created attachment 1712904 [details]
alert rule on Monitoring-Alerting Rules UI


still see 
The deleted PrometheusRule in monitoring-alertrules
The alerts in monitoring-alerts
the deleted PrometheusRule in thanos-ruler UI


Description of problem:
The deleted PrometheusRule is still in the thanos-ruler UI

Steps: enable UserWorkload and do the followings to create PrometheusRule
***********************************************
# oc new-project test3
# oc create -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: test3.rules
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: Watchdog
      expr: vector(1)
      labels:
        severity: none
      message:
        This is an alert meant to ensure that the entire alerting pipeline is functional.
EOF
***********************************************
The PrometheusRule is in rules-configmap-reloader container
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
groups:
- name: alerting rules
  rules:
  - alert: Watchdog
    expr: vector(1)
    labels:
      namespace: test3
      severity: none

# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- ls -al /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
lrwxrwxrwx. 1 root 1000420000 29 Apr  2 12:04 /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml -> ..data/test3-test3.rules.yaml

delete PrometheusRule in test3
# oc delete project test3
project.project.openshift.io "test3" deleted

# oc -n test3 get PrometheusRule
No resources found in test3 namespace.

# oc -n openshift-user-workload-monitoring logs -c rules-configmap-reloader thanos-ruler-user-workload-0 
2020/08/28 01:33:11 Watching directory: "/etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0"
2020/08/28 01:33:13 config map updated
2020/08/28 01:33:13 successfully triggered reload
2020/08/28 01:34:35 config map updated
2020/08/28 01:34:35 successfully triggered reload
2020/08/28 01:45:50 config map updated
2020/08/28 01:45:50 successfully triggered reload
2020/08/28 01:53:53 config map updated
2020/08/28 01:53:53 successfully triggered reload


The PrometheusRule is also removed from rules-configmap-reloader container
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml
cat: /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml: No such file or directory
command terminated with exit code 1

Check after more than 20 minutes,the related alert and alert rules still can be found in UI, see the picture.

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-user-workload-monitoring exec -c thanos-ruler thanos-ruler-user-workload-1 -- curl -k -H "Authorization: Bearer $token" https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/rules | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   741  100   741    0     0  15052      0 --:--:-- --:--:-- --:--:-- 15122
{
  "status": "success",
  "data": {
    "groups": [
      {
        "name": "alerting rules",
        "file": "",
        "rules": [
          {
            "state": "firing",
            "name": "Watchdog",
            "query": "vector(1)",
            "duration": 0,
            "labels": {
              "namespace": "test3",
              "severity": "none",
              "thanos_ruler_replica": "thanos-ruler-user-workload-0"
            },
            "annotations": {},
            "alerts": [
              {
                "labels": {
                  "alertname": "Watchdog",
                  "namespace": "test3",
                  "severity": "none"
                },
                "annotations": {},
                "state": "firing",
                "activeAt": "2020-08-28T01:45:57.078951644Z",
                "value": "1e+00",
                "partialResponseStrategy": "ABORT"
              }
            ],
            "health": "ok",
            "evaluationTime": 0.0018852,
            "lastEvaluation": "2020-08-28T02:09:12.079090599Z",
            "type": "alerting"
          }
        ],
        "interval": 15,
        "evaluationTime": 0,
        "lastEvaluation": "0001-01-01T00:00:00Z",
        "partial_response_strategy": "WARN",
        "partialResponseStrategy": "ABORT"
      }
    ]
  }
}

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-27-005538

How reproducible:
Always

Steps to Reproduce:
1. See the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 hongyan li 2020-08-28 02:25:12 UTC
Similar bug for 4.5
https://bugzilla.redhat.com/show_bug.cgi?id=1820180

Comment 2 hongyan li 2020-08-28 02:28:09 UTC
Similar bug for 4.5
https://bugzilla.redhat.com/show_bug.cgi?id=1820180

Comment 3 hongyan li 2020-08-28 03:21:18 UTC
The alert rule and alerts always display on UI until a new Prometheus rule with different namespace is created

Comment 4 hongyan li 2020-08-28 03:22:07 UTC
The alert rule and alerts always display on UI after deleted until a new Prometheus rule with different namespace is created

Comment 5 Simon Pasquier 2020-08-28 09:44:09 UTC
I can reproduce with the upstream version of Thanos. I'll file an issue + PR there.

Comment 7 Simon Pasquier 2020-08-31 08:01:47 UTC
The upstream PR has been merged, we need to wait for the bump of Thanos in our downstream fork.

Comment 10 hongyan li 2020-09-10 04:16:39 UTC
Test with payload 4.6.0-0.nightly-2020-09-09-173545
The deleted PrometheusRule will disappear in less than a minute


Note You need to log in before you can comment on or make changes to this bug.