Bug 2115826 - User Workload monitoring thanos-querier rewrites "cluster" field with name of openshift-cluster instead of Application's cluster-field name
Summary: User Workload monitoring thanos-querier rewrites "cluster" field with name of...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Jayapriya Pai
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-05 13:42 UTC by nigsmith
Modified: 2023-03-09 01:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:27:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description nigsmith 2022-08-05 13:42:41 UTC
Description of problem:

The "thanos-querier" somehow rewrites the "cluster"-field with the name of the Openshift-cluster, as opposed to when doing the exact same query but on the "prometheus" API endpoint


Version-Release number of selected component (if applicable):

4.9 


How reproducible:

Customer is able to reproduce at will. 

Steps to Reproduce:

Thanos-querier API endpoint :
$ curl --noproxy "*" -sLk --data-urlencode "query=opensearch_cluster_nodes_number" -H "Authorization: Bearer xxxx" "https://thanos-querier-openshift-monitoring.apps.smals-75.paas.acc.cloud.smals.be/api/v1/query" | jq-win64.exe
...
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "opensearch_cluster_nodes_number",
          "cluster": "smals-75",
          "container": "opensearch",
          "endpoint": "http",
...

Prometheus API endpoint :
$ curl --noproxy "*" -sLk --data-urlencode "query=opensearch_cluster_nodes_number" -H "Authorization: Bearer xxxx" "https://prometheus-k8s-openshift-monitoring.apps.smals-75.paas.acc.cloud.smals.be/api/v1/query" | jq-win64.exe
...
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "opensearch_cluster_nodes_number",
          "cluster": "opensearch-test",
          "container": "opensearch",
          "endpoint": "http",


Actual results:


Expected results:


Additional info:

Comment 1 Joao Marcal 2022-08-05 14:14:42 UTC
Can you provide a Mustgather otherwise it might be difficult for uns to exactly understand what might be happening

Comment 2 Joao Marcal 2022-08-05 14:20:55 UTC
Actually, Thanos querier is working as expected because it will always include the external labels from Prometheus. Not sure how much of an annoyance it is for the customer but maybe we can improve our documentation to illustrate better the differences between Thanos and Prometheus APIs.

Comment 3 nigsmith 2022-08-05 14:23:34 UTC
apologies case is now linked, the must-gather is attached to the case

Comment 4 nigsmith 2022-08-05 14:25:51 UTC
Hello Joao, 

>Actually, Thanos querier is working as expected because it will always include the external labels from Prometheus. Not sure how much of an annoyance it is for the customer but maybe we can improve our > >documentation to illustrate better the differences between Thanos and Prometheus APIs.

Initially I thought this was the case - documentation bug rather than a code bug. 

do we have it documented anywhere that this is the expected behaviour? 

Thanks

Comment 9 Shiftzilla 2023-03-09 01:27:02 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9450


Note You need to log in before you can comment on or make changes to this bug.