Bug 1883765 - [user workload monitoring] improve latency of Thanos sidecar when streaming read requests
Summary: [user workload monitoring] improve latency of Thanos sidecar when streaming ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-30 07:55 UTC by Simon Pasquier
Modified: 2021-02-24 15:21 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:21:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CPU utilization dashboard (64.23 KB, image/png)
2020-10-02 10:12 UTC, Simon Pasquier
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 969 0 None closed Bug 1883765: Bump Thanos v0.16.0 2021-01-25 02:45:54 UTC
Github openshift thanos pull 40 0 None closed Bug 1883765: bump Thanos to v0.16.0 2021-01-25 02:45:55 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:21:38 UTC

Description Simon Pasquier 2020-09-30 07:55:38 UTC
Back port https://github.com/thanos-io/thanos/pull/3146 which cuts latency by half for streaming read requests. This will significantly improve request latency of Thanos querier (including evaluations from Thanos ruler) when a query returns lots of series (e.g. several hundreds).

The fix should be available in Thanos v0.16.0 upstream.

Comment 1 Simon Pasquier 2020-09-30 11:42:32 UTC
https://github.com/thanos-io/thanos/pull/2783 might also be useful.

Comment 3 Simon Pasquier 2020-10-02 10:12:21 UTC
Created attachment 1718402 [details]
CPU utilization dashboard

I've tested locally with Thanos v0.16.0-rc.0 and the performances are much better. For the same workload, the overall CPU usage of the prometheus pod goes from 700Mi (right side of the dashboard, 4.6 downstream version) to 450Mi (left side, upstream v0.16.0-rc.0).

Comment 7 Junqi Zhao 2020-11-05 07:18:33 UTC
tested with 4.7.0-0.nightly-2020-11-05-010603, Thanos sidecar performance is improved and thanos version is 0.16.0 

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -g -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=thanos_build_info' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "thanos_build_info",
          "branch": "rhaos-4.7-rhel-8",
          "container": "oauth-proxy",
          "endpoint": "web",
          "goversion": "go1.15.2",
          "instance": "10.128.2.9:9091",
          "job": "thanos-querier",
          "namespace": "openshift-monitoring",
          "pod": "thanos-querier-66679569c6-gxf7p",
          "revision": "c8ee6fabf9fd5f3e44701f981204a1361965e59d",
          "service": "thanos-querier",
          "version": "0.16.0"
        },
        "value": [
          1604559215.114,
          "1"
        ]
      },
...
  }
}

Comment 11 errata-xmlrpc 2021-02-24 15:21:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.