Bug 2059470

Summary: Unable to connect external Grafana with Openshift Monitoring
Product: OpenShift Container Platform Reporter: Arunprasad Rajkumar <arajkuma>
Component: MonitoringAssignee: Arunprasad Rajkumar <arajkuma>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.10CC: amuller, anpicker, aos-bugs, erooth, juzhao
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2059468 Environment:
Last Closed: 2022-03-21 12:30:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2059468    
Bug Blocks:    

Description Arunprasad Rajkumar 2022-03-01 06:27:36 UTC
+++ This bug was initially created as a clone of Bug #2059468 +++

Description of problem:

We use the User Workload Monitoring feature to collect metrics from applications running in our namespaces. We are trying to create a custom Grafana deployments that show metrics within the same namespace. Instead of using the cluster-scoped "9091" Thanos port in the openshift-monitoring namespace that requires a service account with cluster scoped cluster-monitoring role, we'd like to use the "9092" port that limits the returned metrics to one namespace only.

The Grafana dashboards with list variables regularly use "label_values" to lookup values for variables (e.g. namespace, instance...). This uses the `/api/v1/series` Prometheus endpoint, which is currently not supported by the namespace scoped Thanos endpoint (port 9092), Http 404 is returned.



Version-Release number of selected component (if applicable):
OCP < 4.10

How reproducible:

An example code to test:

```

$ oc port-forward -n openshift-monitoring service/thanos-querier 9091 9091
$ oc port-forward -n openshift-monitoring service/thanos-querier 9092 9092
$ BEARER_CLUSTER="$(oc extract secret/monitor-cluster-token-htp4g --to=- --keys=token)" 
$ BEARER_TOKEN="$(oc extract secret/monitor-namespace-token-k6nbd --to=- --keys=token)" 
$ curl -vk -H "Authorization: Bearer $BEARER_CLUSTER" 'https://localhost:9091/api/v1/query?query=upx' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_CLUSTER" 'https://localhost:9091/api/v1/series?match%5B%5D=jvm_memory_used_bytes' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_TOKEN" 'https://localhost:9092/api/v1/query?query=upx&namespace=mynamespace' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_TOKEN" 'https://localhost:9092/api/v1/series?match%5B%5D=jvm_memory_used_bytes&namespace=mynamespace' 
-> HTTP 404

```

Steps to Reproduce:
1.
2.
3.

Actual results:

All the query variable functions listed in [1] is not working on GET endpoint.

Expected results:

All the query variable functions listed in [1] should work on GET endpoint.


Additional info:

POST method will not work on endpoints listed in [1]


[1] https://grafana.com/docs/grafana/latest/datasources/prometheus/#query-variable

--- Additional comment from Arunprasad Rajkumar on 2022-03-01 11:56:16 IST ---

This has been already fixed in [1] [2]

[1] https://github.com/openshift/cluster-monitoring-operator/pull/1519
[2] https://github.com/openshift/cluster-monitoring-operator/pull/1299

Comment 2 Junqi Zhao 2022-03-10 07:04:14 UTC
tested with openshift/cluster-monitoring-operator/pull/1549, followed the same steps in Comment 1, `/api/v1/series` is supported by the namespace scoped Thanos endpoint 9092
# curl -k -H "Authorization: Bearer $token" 'https://localhost:9092/api/v1/series?match%5B%5D=up&namespace=ns1' | jq
{
  "status": "success",
  "data": [
    {
      "__name__": "up",
      "endpoint": "web",
      "instance": "10.128.2.26:8080",
      "job": "prometheus-example-app",
      "namespace": "ns1",
      "pod": "prometheus-example-app-8659789999-khnc8",
      "prometheus": "openshift-user-workload-monitoring/user-workload",
      "service": "prometheus-example-app"
    }
  ]
}

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://localhost:9092/api/v1/series?namespace=ns1' | jq
{
  "status": "success",
  "data": [
    {
      "__name__": "ALERTS",
      "alertname": "TestAlert",
      "alertstate": "firing",
      "namespace": "ns1",
      "severity": "none"
    },
...

Comment 7 errata-xmlrpc 2022-03-21 12:30:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.25 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0861