Bug 2059470 - Unable to connect external Grafana with Openshift Monitoring
Summary: Unable to connect external Grafana with Openshift Monitoring
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.z
Assignee: Arunprasad Rajkumar
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 2059468
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-01 06:27 UTC by Arunprasad Rajkumar
Modified: 2022-03-21 12:30 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2059468
Environment:
Last Closed: 2022-03-21 12:30:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1549 0 None open Back port changes to support external grafana 2022-03-01 06:29:46 UTC
Red Hat Product Errata RHBA-2022:0861 0 None None None 2022-03-21 12:30:28 UTC

Description Arunprasad Rajkumar 2022-03-01 06:27:36 UTC
+++ This bug was initially created as a clone of Bug #2059468 +++

Description of problem:

We use the User Workload Monitoring feature to collect metrics from applications running in our namespaces. We are trying to create a custom Grafana deployments that show metrics within the same namespace. Instead of using the cluster-scoped "9091" Thanos port in the openshift-monitoring namespace that requires a service account with cluster scoped cluster-monitoring role, we'd like to use the "9092" port that limits the returned metrics to one namespace only.

The Grafana dashboards with list variables regularly use "label_values" to lookup values for variables (e.g. namespace, instance...). This uses the `/api/v1/series` Prometheus endpoint, which is currently not supported by the namespace scoped Thanos endpoint (port 9092), Http 404 is returned.



Version-Release number of selected component (if applicable):
OCP < 4.10

How reproducible:

An example code to test:

```

$ oc port-forward -n openshift-monitoring service/thanos-querier 9091 9091
$ oc port-forward -n openshift-monitoring service/thanos-querier 9092 9092
$ BEARER_CLUSTER="$(oc extract secret/monitor-cluster-token-htp4g --to=- --keys=token)" 
$ BEARER_TOKEN="$(oc extract secret/monitor-namespace-token-k6nbd --to=- --keys=token)" 
$ curl -vk -H "Authorization: Bearer $BEARER_CLUSTER" 'https://localhost:9091/api/v1/query?query=upx' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_CLUSTER" 'https://localhost:9091/api/v1/series?match%5B%5D=jvm_memory_used_bytes' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_TOKEN" 'https://localhost:9092/api/v1/query?query=upx&namespace=mynamespace' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_TOKEN" 'https://localhost:9092/api/v1/series?match%5B%5D=jvm_memory_used_bytes&namespace=mynamespace' 
-> HTTP 404

```

Steps to Reproduce:
1.
2.
3.

Actual results:

All the query variable functions listed in [1] is not working on GET endpoint.

Expected results:

All the query variable functions listed in [1] should work on GET endpoint.


Additional info:

POST method will not work on endpoints listed in [1]


[1] https://grafana.com/docs/grafana/latest/datasources/prometheus/#query-variable

--- Additional comment from Arunprasad Rajkumar on 2022-03-01 11:56:16 IST ---

This has been already fixed in [1] [2]

[1] https://github.com/openshift/cluster-monitoring-operator/pull/1519
[2] https://github.com/openshift/cluster-monitoring-operator/pull/1299

Comment 2 Junqi Zhao 2022-03-10 07:04:14 UTC
tested with openshift/cluster-monitoring-operator/pull/1549, followed the same steps in Comment 1, `/api/v1/series` is supported by the namespace scoped Thanos endpoint 9092
# curl -k -H "Authorization: Bearer $token" 'https://localhost:9092/api/v1/series?match%5B%5D=up&namespace=ns1' | jq
{
  "status": "success",
  "data": [
    {
      "__name__": "up",
      "endpoint": "web",
      "instance": "10.128.2.26:8080",
      "job": "prometheus-example-app",
      "namespace": "ns1",
      "pod": "prometheus-example-app-8659789999-khnc8",
      "prometheus": "openshift-user-workload-monitoring/user-workload",
      "service": "prometheus-example-app"
    }
  ]
}

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://localhost:9092/api/v1/series?namespace=ns1' | jq
{
  "status": "success",
  "data": [
    {
      "__name__": "ALERTS",
      "alertname": "TestAlert",
      "alertstate": "firing",
      "namespace": "ns1",
      "severity": "none"
    },
...

Comment 7 errata-xmlrpc 2022-03-21 12:30:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.25 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0861


Note You need to log in before you can comment on or make changes to this bug.