Bug 2059468

Summary: Unable to connect external Grafana with Openshift Monitoring
Product: OpenShift Container Platform Reporter: Arunprasad Rajkumar <arajkuma>
Component: MonitoringAssignee: Arunprasad Rajkumar <arajkuma>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.10CC: amuller, anpicker, aos-bugs, erooth
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2059470 (view as bug list) Environment:
Last Closed: 2022-03-16 11:12:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2059470    

Description Arunprasad Rajkumar 2022-03-01 06:14:48 UTC
Description of problem:

We use the User Workload Monitoring feature to collect metrics from applications running in our namespaces. We are trying to create a custom Grafana deployments that show metrics within the same namespace. Instead of using the cluster-scoped "9091" Thanos port in the openshift-monitoring namespace that requires a service account with cluster scoped cluster-monitoring role, we'd like to use the "9092" port that limits the returned metrics to one namespace only.

The Grafana dashboards with list variables regularly use "label_values" to lookup values for variables (e.g. namespace, instance...). This uses the `/api/v1/series` Prometheus endpoint, which is currently not supported by the namespace scoped Thanos endpoint (port 9092), Http 404 is returned.



Version-Release number of selected component (if applicable):
OCP < 4.10

How reproducible:

An example code to test:

```

$ oc port-forward -n openshift-monitoring service/thanos-querier 9091 9091
$ oc port-forward -n openshift-monitoring service/thanos-querier 9092 9092
$ BEARER_CLUSTER="$(oc extract secret/monitor-cluster-token-htp4g --to=- --keys=token)" 
$ BEARER_TOKEN="$(oc extract secret/monitor-namespace-token-k6nbd --to=- --keys=token)" 
$ curl -vk -H "Authorization: Bearer $BEARER_CLUSTER" 'https://localhost:9091/api/v1/query?query=upx' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_CLUSTER" 'https://localhost:9091/api/v1/series?match%5B%5D=jvm_memory_used_bytes' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_TOKEN" 'https://localhost:9092/api/v1/query?query=upx&namespace=mynamespace' 
-> OK 
$ curl -vk -H "Authorization: Bearer $BEARER_TOKEN" 'https://localhost:9092/api/v1/series?match%5B%5D=jvm_memory_used_bytes&namespace=mynamespace' 
-> HTTP 404

```

Steps to Reproduce:
1.
2.
3.

Actual results:

All the query variable functions listed in [1] is not working on GET endpoint.

Expected results:

All the query variable functions listed in [1] should work on GET endpoint.


Additional info:

POST method will not work on endpoints listed in [1]


[1] https://grafana.com/docs/grafana/latest/datasources/prometheus/#query-variable

Comment 1 Arunprasad Rajkumar 2022-03-01 06:26:16 UTC
This has been already fixed in [1] [2]

[1] https://github.com/openshift/cluster-monitoring-operator/pull/1519
[2] https://github.com/openshift/cluster-monitoring-operator/pull/1299

Comment 2 Junqi Zhao 2022-03-08 08:17:29 UTC
checked with 4.10.0-0.nightly-2022-03-05-023708, thanos-querier now supported /api/v1/series, 
# oc -n openshift-monitoring get deploy thanos-querier -oyaml
...
        - --allow-paths=/api/v1/query,/api/v1/query_range,/api/v1/labels,/api/v1/label/*/values,/api/v1/series
        - --tls-min-version=VersionTLS12
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bde2a3cf16e475f514b6a0360a5d7de04a4e0a7f9792f1c3a264f4c347ce19ca
        imagePullPolicy: IfNotPresent
        name: kube-rbac-proxy

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/series?match[]=scrape_series_added&namespace=openshift-console' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   523  100   523    0     0  13075      0 --:--:-- --:--:-- --:--:-- 13075
{
  "status": "success",
  "data": [
    {
      "__name__": "scrape_series_added",
      "container": "console",
      "endpoint": "https",
      "instance": "10.128.0.52:8443",
      "job": "console",
      "namespace": "openshift-console",
      "pod": "console-5fd958c49-jgp59",
      "prometheus": "openshift-monitoring/k8s",
      "service": "console"
    },
    {
      "__name__": "scrape_series_added",
      "container": "console",
      "endpoint": "https",
      "instance": "10.130.0.136:8443",
      "job": "console",
      "namespace": "openshift-console",
      "pod": "console-5fd958c49-5xw5g",
      "prometheus": "openshift-monitoring/k8s",
      "service": "console"
    }
  ]

Comment 5 errata-xmlrpc 2022-03-16 11:12:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.4 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0811