Bug 1969409 - Grafana template variables fail
Summary: Grafana template variables fail
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.7
Hardware: All
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Bartek Plotka
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1969407 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-08 11:43 UTC by Tobias Derksen
Modified: 2021-06-11 08:38 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-11 08:38:38 UTC
Target Upstream Version:
Embargoed:
ableisch: needinfo-


Attachments (Terms of Use)
Grafana Screenshot (87.86 KB, image/png)
2021-06-08 11:43 UTC, Tobias Derksen
no flags Details

Description Tobias Derksen 2021-06-08 11:43:18 UTC
Created attachment 1789363 [details]
Grafana Screenshot

Initial use case:
I want to create dashboards on a custom Grafana instance which is connected to the thanos querier of the OCP monitoring stack.

Description of problem:
I deployed a custom Grafana instance and connected it to Thanos querier. The cluster has user workload monitoring deployed.
When I try to use template variables in Grafana, a error message is shown (see screenshot attached) and the log shows that the request returns a 404.
https://grafana.com/docs/grafana/latest/variables/

Accessing the metrics in general works fine.


Version-Release number of selected component (if applicable):
Tested on:
OCP 4.6.31
OCP 4.7.13
OCP 4.8.0-fc.8


Steps to Reproduce:
1. Create new Grafana dashboard
2. Set up template variable with query "label_values(kube_pod_info, pod)"

Other option accessing API directly:
1. Start a pod with curl installed
2. Try accessing series endpoint
> curl -k -g -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" "https://thanos-querier.openshift-monitoring.svc:9092/api/v1/series?namespace=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)" --data-urlencode 'match[]=up'


Actual results:
> 404 page not found

Expected results:
See prometheus documentation for endpoint results:
https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers

You can see the expected result by accessing the endpoint on thanos directly. Run the following command inside thanos-query container (pod: thanos-querier):
> curl -k -g -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" "http://localhost:9090/api/v1/series?namespace=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)" --data-urlencode 'match[]=up'


Additional info:
After investigations I found out that one sidecar of the monitoring stack simply does not implement that particular prometheus endpoint. 
There exists an issue on the upstream Github repository: https://github.com/prometheus-community/prom-label-proxy/issues/37
With the latest version 0.3.0 released on 2021-04-16 the endpoint has been implemented. 
See: https://github.com/prometheus-community/prom-label-proxy/blob/master/CHANGELOG.md

On OCP 4.7 and newer additionally the kube-rbac-proxy denies requests to any other endpoint than /query and /query_range.


Attachments:
Datasource definition:
> apiVersion: 1
> datasources:
>   - name: combined
>     type: prometheus
>     access: proxy
>     isDefault: true
>     basicAuth: false
>     editable: false
>     orgId: 1
>     version: 1
>     url: https://thanos-querier.openshift-monitoring.svc:9092
>     jsonData:
>       customQueryParameters: 'namespace=${TARGET_NAMESPACE}'
>       httpHeaderName1: Authorization
>       tlsAuthWithCACert: true
>     secureJsonData:
>       tlsCACert: $__file{/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt}
>       httpHeaderValue1: Bearer $__file{/var/run/secrets/kubernetes.io/serviceaccount/token}

Grafana Log:
> t=2021-06-08T09:15:21+0000 lvl=dbug msg="Applying default URL parsing for this data source type" logger=datasource type=prometheus url=https://thanos-querier.openshift-monitoring.svc:9092
> t=2021-06-08T09:15:21+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=user method=GET path=/api/datasources/proxy/1/api/v1/series status=404 remote_addr="10.250.2.165, 10.128.8.7" time_ms=19 size=19 referer="https://grafana-monitoring.apps.cluster.domain/d/viQLYNeGk/new-dashboard-copy?editview=templating&orgId=1"

Comment 3 Bartek Plotka 2021-06-09 07:42:31 UTC
Thanks for reporting.

Just for awareness, this is "expected" because we don't enable label APIs on ur tenancy isolation proxy. 

The mitigation is to use `/api/v1/series` endpoint (if I am not wrong `metrics(metric)` and `query_result(query)`). 

We are right now discussing how hard would be to enable labels values API on this proxy. We never saw the need for that yet.

Comment 6 Bartek Plotka 2021-06-09 15:11:21 UTC
What I mean by using "series" is that some of the functions are using different APIs behind. What is actually used is a "hidden" domain knowledge, unfortunately. I asked Grafana for better documentation (https://github.com/grafana/grafana/issues/35437) on this front.

Mitigation is using `query_result(query)` and trying to get that label using regex. Quite complex, but might work similarly for now.

In the mean time looks like we are scheduling enabling Labels API through isolated endpoint to our backlog. 

Hope that helps!

Comment 7 Sunil Thaha 2021-06-11 01:33:30 UTC
*** Bug 1969407 has been marked as a duplicate of this bug. ***

Comment 8 Simon Pasquier 2021-06-11 08:38:38 UTC
We don't consider this to be a bug but we've added an item [1] in our backlog to tackle it.

[1] https://issues.redhat.com/browse/MON-1695


Note You need to log in before you can comment on or make changes to this bug.