Bug 1827530

Summary: no alert/rule on thanos-ruler UI
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone: ---Keywords: Regression
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:30:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1819765, 1828702    
Attachments:
Description Flags
no alert/rule on thanos-ruler UI none

Description Junqi Zhao 2020-04-24 06:13:51 UTC
Created attachment 1681367 [details]
no alert/rule on thanos-ruler UI

Description of problem:
enabled techPreviewUserWorkload, and create PrometheusRule under user namespace, there is not alert/rule on thanos-ruler UI.
Steps:
# oc -n openshift-user-workload-monitoring get pod
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-765866997c-6fn65   2/2     Running   0          52m
prometheus-user-workload-0             5/5     Running   1          52m
prometheus-user-workload-1             5/5     Running   1          52m
thanos-ruler-user-workload-0           3/3     Running   0          51m
thanos-ruler-user-workload-1           3/3     Running   0          51m


# oc new-project test3
# oc create -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: test3.rules
spec:
  groups:
  - name: alerting rules
    rules:
    - alert: Watchdog
      expr: vector(1)
      labels:
        severity: none
      message:
        This is an alert meant to ensure that the entire alerting pipeline is functional.
EOF

could find the rule in rules-configmap-reloader container
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- cat /etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0/test3-test3.rules.yaml 
groups:
- name: alerting rules
  rules:
  - alert: Watchdog
    expr: vector(1)
    labels:
      namespace: test3
      severity: none

no alerts/rules from API query
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/alerts' | jq
{
  "status": "success",
  "data": {
    "alerts": null
  }
}
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/rules' | jq
  "status": "success",
  "data": {
    "groups": null
  }
}



Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-23-202137

How reproducible:
Always

Steps to Reproduce:
1. See the description
2.
3.

Actual results:
no alert/rule on thanos-ruler UI

Expected results:
alert/rule on thanos-ruler UI

Additional info:

Comment 1 Junqi Zhao 2020-04-24 06:21:10 UTC
the same with thanos-ruler sa
# token=`oc sa get-token thanos-ruler -n openshift-user-workload-monitoring`
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos
{
  "status": "success",
  "data": {
    "alerts": null
  }
}
# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/rules' | jq
{
  "status": "success",
  "data": {
    "groups": null
  }
}

Comment 2 Junqi Zhao 2020-04-24 06:25:48 UTC
(In reply to Junqi Zhao from comment #1)
> the same with thanos-ruler sa
> # oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader
> thanos-ruler-user-workload-0 -- curl -k -H "Authorization: Bearer $token"
> 'https://thanos
> {
>   "status": "success",
>   "data": {
>     "alerts": null
>   }
should be

# oc -n openshift-user-workload-monitoring exec -c rules-configmap-reloader thanos-ruler-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/alerts' | jq
{
  "status": "success",
  "data": {
    "alerts": null
  }
}

Comment 7 Junqi Zhao 2020-04-26 02:02:27 UTC
not sure if the issue is related to Bug 1827489, after Bug 1827489 is fixed,the  there is not such issue with 4.5.0-0.nightly-2020-04-25-170442. close it
# oc -n openshift-user-workload-monitoring logs thanos-ruler-user-workload-0 -c rules-configmap-reloader
2020/04/26 01:41:17 Watching directory: "/etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0"

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/alerts' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   244  100   244    0     0   5741      0 --:--:-- --:--:-- --:--:--  5809
{
  "status": "success",
  "data": {
    "alerts": [
      {
        "labels": {
          "alertname": "Watchdog",
          "namespace": "load",
          "severity": "none"
        },
        "annotations": {},
        "state": "firing",
        "activeAt": "2020-04-26T01:53:41.308293746Z",
        "value": "1e+00",
        "partial_response_strategy": "ABORT"
      }
    ]
  }
}

Comment 14 Junqi Zhao 2020-05-06 09:16:00 UTC
tested with 4.5.0-0.nightly-2020-05-05-205255, there are alerts/rule on thanos-ruler UI
# oc -n openshift-user-workload-monitoring logs thanos-ruler-user-workload-0 -c rules-configmap-reloader
2020/05/06 08:23:14 Watching directory: "/etc/thanos/rules/thanos-ruler-user-workload-rulefiles-0"
2020/05/06 09:06:27 config map updated

# token=`oc sa get-token thanos-ruler -n openshift-user-workload-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-ruler.openshift-user-workload-monitoring.svc:9091/api/v1/alerts' | jq
{
  "status": "success",
  "data": {
    "alerts": [
      {
        "labels": {
          "alertname": "Watchdog",
          "namespace": "test3",
          "severity": "none"
        },
        "annotations": {},
        "state": "firing",
        "activeAt": "2020-05-06T09:06:42.078951644Z",
        "value": "1e+00",
        "partial_response_strategy": "ABORT"
      }
    ]
  }
}

Comment 15 errata-xmlrpc 2020-07-13 17:30:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409