Bug 2115826

Summary: User Workload monitoring thanos-querier rewrites "cluster" field with name of openshift-cluster instead of Application's cluster-field name
Product: OpenShift Container Platform Reporter: nigsmith
Component: MonitoringAssignee: Jayapriya Pai <janantha>
Status: CLOSED DEFERRED QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.9CC: anpicker, jmarcal, ssonigra
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:27:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nigsmith 2022-08-05 13:42:41 UTC
Description of problem:

The "thanos-querier" somehow rewrites the "cluster"-field with the name of the Openshift-cluster, as opposed to when doing the exact same query but on the "prometheus" API endpoint


Version-Release number of selected component (if applicable):

4.9 


How reproducible:

Customer is able to reproduce at will. 

Steps to Reproduce:

Thanos-querier API endpoint :
$ curl --noproxy "*" -sLk --data-urlencode "query=opensearch_cluster_nodes_number" -H "Authorization: Bearer xxxx" "https://thanos-querier-openshift-monitoring.apps.smals-75.paas.acc.cloud.smals.be/api/v1/query" | jq-win64.exe
...
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "opensearch_cluster_nodes_number",
          "cluster": "smals-75",
          "container": "opensearch",
          "endpoint": "http",
...

Prometheus API endpoint :
$ curl --noproxy "*" -sLk --data-urlencode "query=opensearch_cluster_nodes_number" -H "Authorization: Bearer xxxx" "https://prometheus-k8s-openshift-monitoring.apps.smals-75.paas.acc.cloud.smals.be/api/v1/query" | jq-win64.exe
...
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "opensearch_cluster_nodes_number",
          "cluster": "opensearch-test",
          "container": "opensearch",
          "endpoint": "http",


Actual results:


Expected results:


Additional info:

Comment 1 Joao Marcal 2022-08-05 14:14:42 UTC
Can you provide a Mustgather otherwise it might be difficult for uns to exactly understand what might be happening

Comment 2 Joao Marcal 2022-08-05 14:20:55 UTC
Actually, Thanos querier is working as expected because it will always include the external labels from Prometheus. Not sure how much of an annoyance it is for the customer but maybe we can improve our documentation to illustrate better the differences between Thanos and Prometheus APIs.

Comment 3 nigsmith 2022-08-05 14:23:34 UTC
apologies case is now linked, the must-gather is attached to the case

Comment 4 nigsmith 2022-08-05 14:25:51 UTC
Hello Joao, 

>Actually, Thanos querier is working as expected because it will always include the external labels from Prometheus. Not sure how much of an annoyance it is for the customer but maybe we can improve our > >documentation to illustrate better the differences between Thanos and Prometheus APIs.

Initially I thought this was the case - documentation bug rather than a code bug. 

do we have it documented anywhere that this is the expected behaviour? 

Thanks

Comment 9 Shiftzilla 2023-03-09 01:27:02 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9450