Created attachment 1712904 [details] alert rule on Monitoring-Alerting Rules UI Created attachment 1712904 [details] alert rule on Monitoring-Alerting Rules UI still see The deleted PrometheusRule in monitoring-alertrules The alerts in monitoring-alerts the deleted PrometheusRule in thanos-ruler UI Description of problem: The deleted PrometheusRule is still in the thanos-ruler UI Steps: enable UserWorkload and do the followings to create PrometheusRule *********************************************** # oc new-project test3 # oc create -f - << EOF apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: test3.rules spec: groups: - name: alerting rules rules: - alert: Watchdog expr: vector(1) labels: severity: none message: This is an alert meant to ensure that the entire alerting pipeline is functional. EOF *********************************************** The PrometheusRule is in rules-configmap-reloader container # oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml groups: - name: alerting rules rules: - alert: Watchdog expr: vector(1) labels: namespace: test3 severity: none # oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- ls -al /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml lrwxrwxrwx. 1 root 1000420000 29 Apr 2 12:04 /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml -> ..data/test3-test3.rules.yaml delete PrometheusRule in test3 # oc delete project test3 project.project.openshift.io "test3" deleted # oc -n test3 get PrometheusRule No resources found in test3 namespace. # oc -n openshift-user-workload-monitoring logs -c rules-configmap-reloader thanos-ruler-user-workload-0 2020/08/28 01:33:11 Watching directory: "/etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0" 2020/08/28 01:33:13 config map updated 2020/08/28 01:33:13 successfully triggered reload 2020/08/28 01:34:35 config map updated 2020/08/28 01:34:35 successfully triggered reload 2020/08/28 01:45:50 config map updated 2020/08/28 01:45:50 successfully triggered reload 2020/08/28 01:53:53 config map updated 2020/08/28 01:53:53 successfully triggered reload The PrometheusRule is also removed from rules-configmap-reloader container # oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml cat: /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml: No such file or directory command terminated with exit code 1 Check after more than 20 minutes,the related alert and alert rules still can be found in UI, see the picture. # token=`oc sa get-token prometheus-k8s -n openshift-monitoring` # oc -n openshift-user-workload-monitoring exec -c thanos-ruler thanos-ruler-user-workload-1 -- curl -k -H "Authorization: Bearer $token" https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/rules | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 741 100 741 0 0 15052 0 --:--:-- --:--:-- --:--:-- 15122 { "status": "success", "data": { "groups": [ { "name": "alerting rules", "file": "", "rules": [ { "state": "firing", "name": "Watchdog", "query": "vector(1)", "duration": 0, "labels": { "namespace": "test3", "severity": "none", "thanos_ruler_replica": "thanos-ruler-user-workload-0" }, "annotations": {}, "alerts": [ { "labels": { "alertname": "Watchdog", "namespace": "test3", "severity": "none" }, "annotations": {}, "state": "firing", "activeAt": "2020-08-28T01:45:57.078951644Z", "value": "1e+00", "partialResponseStrategy": "ABORT" } ], "health": "ok", "evaluationTime": 0.0018852, "lastEvaluation": "2020-08-28T02:09:12.079090599Z", "type": "alerting" } ], "interval": 15, "evaluationTime": 0, "lastEvaluation": "0001-01-01T00:00:00Z", "partial_response_strategy": "WARN", "partialResponseStrategy": "ABORT" } ] } } Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-08-27-005538 How reproducible: Always Steps to Reproduce: 1. See the description 2. 3. Actual results: Expected results: Additional info:
Similar bug for 4.5 https://bugzilla.redhat.com/show_bug.cgi?id=1820180
The alert rule and alerts always display on UI until a new Prometheus rule with different namespace is created
The alert rule and alerts always display on UI after deleted until a new Prometheus rule with different namespace is created
I can reproduce with the upstream version of Thanos. I'll file an issue + PR there.
The upstream PR has been merged, we need to wait for the bump of Thanos in our downstream fork.
Test with payload 4.6.0-0.nightly-2020-09-09-173545 The deleted PrometheusRule will disappear in less than a minute
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196