Bug 1801154

Summary: Port changes "throw away unused high cardinality apiserver duration buckets"
Product: OpenShift Container Platform Reporter: Lili Cosic <lcosic>
Component: MonitoringAssignee: Lili Cosic <lcosic>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.4CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-13 21:57:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lili Cosic 2020-02-10 11:36:14 UTC
Description of problem:
These changes did not end up in openshift correctly during feature phase, as they were never applied in the apiserver ServiceMonitors.


How reproducible:
Check ServiceMonitor for dropping the following:
                  "regex: 'apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)',"


Expected results:
Drop the apiserver duration buckets.

Additional info:

PR in "upstream" -> https://github.com/coreos/kube-prometheus/pull/387/files

Comment 2 Junqi Zhao 2020-02-11 04:31:23 UTC
tested with 4.4.0-0.nightly-2020-02-10-215022, the fix is in
# oc -n openshift-apiserver get servicemonitor/openshift-apiserver -oyaml | grep apiserver_request_duration_seconds_bucket -A3 -B1
    - action: drop
      regex: apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)
      sourceLabels:
      - __name__
      - le

and search in prometheus, the unused high cardinality apiserver duration buckets are dropped
count(apiserver_request_duration_seconds_bucket{namespace="openshift-apiserver"}) by (namespace,le)
Element 	Value
{le="0.05",namespace="openshift-apiserver"}	139
{le="0.5",namespace="openshift-apiserver"}	139
{le="10",namespace="openshift-apiserver"}	139
{le="20",namespace="openshift-apiserver"}	139
{le="4",namespace="openshift-apiserver"}	139
{le="40",namespace="openshift-apiserver"}	139
{le="60",namespace="openshift-apiserver"}	139
{le="+Inf",namespace="openshift-apiserver"}	139
{le="0.1",namespace="openshift-apiserver"}	139
{le="0.2",namespace="openshift-apiserver"}	139
{le="1",namespace="openshift-apiserver"}	139
{le="2",namespace="openshift-apiserver"}	139
{le="5",namespace="openshift-apiserver"}	139

Comment 3 Lili Cosic 2020-04-20 10:03:46 UTC
Already in release notes, no need for docs.

Comment 5 errata-xmlrpc 2020-05-13 21:57:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581