Bug 1856719 - "code:apiserver_request_count:rate:sum" should be replaced by "code:apiserver_request_total:rate:sum" in telemetry-config configmap
Summary: "code:apiserver_request_count:rate:sum" should be replaced by "code:apiserver...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.z
Assignee: Frederic Branczyk
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1859164
Blocks: 1856767
TreeView+ depends on / blocked
 
Reported: 2020-07-14 09:44 UTC by Junqi Zhao
Modified: 2020-07-30 18:57 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1856767 1859164 (view as bug list)
Environment:
Last Closed: 2020-07-30 18:56:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
telemetry-config configmap file (12.55 KB, text/plain)
2020-07-14 09:44 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 877 0 None closed BUG 1856719: manifests/telemetry: replace apiserver_request_count with apiserver_r… 2021-01-29 09:32:01 UTC
Red Hat Product Errata RHBA-2020:3028 0 None None None 2020-07-30 18:57:26 UTC

Description Junqi Zhao 2020-07-14 09:44:57 UTC
Created attachment 1700998 [details]
telemetry-config configmap file

Description of problem:
It's "code:apiserver_request_count:rate:sum" in telemetry-config configmap, but there is not such metrics, should be "code:apiserver_request_total:rate:sum"
# oc -n openshift-monitoring get cm telemetry-config -oyaml 
...
    - '{__name__="code:apiserver_request_count:rate:sum"}'
...

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "code:apiserver_request"
    "code:apiserver_request_total:increase30d",
    "code:apiserver_request_total:rate:sum",


Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-07-14-022827

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lili Cosic 2020-07-14 09:58:47 UTC
Thanks Junqi, I was going to open a Bugzilla for this already, it should be fixed in 4.6 already. But we need another one for 4.4 as well, correct?

Comment 3 Pawel Krupa 2020-07-14 10:05:43 UTC
Fix for 4.6 was included in https://github.com/openshift/cluster-monitoring-operator/pull/821

Comment 4 Junqi Zhao 2020-07-14 12:29:05 UTC
(In reply to Lili Cosic from comment #1)
> Thanks Junqi, I was going to open a Bugzilla for this already, it should be
> fixed in 4.6 already. But we need another one for 4.4 as well, correct?

yes, 4.4 bug: https://bugzilla.redhat.com/show_bug.cgi?id=1856767

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-07-12-055624   True        False         5h58m   Cluster version is 4.4.0-0.nightly-2020-07-12-055624

# oc -n openshift-monitoring get cm telemetry-config -oyaml  | grep "code:apiserver_request"
    # code:apiserver_request_count:rate:sum identifies average of occurances
    - '{__name__="code:apiserver_request_count:rate:sum"}'

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "code:apiserver_request"
    "code:apiserver_request_total:rate:sum",

Comment 8 Lili Cosic 2020-07-21 06:33:12 UTC
Reassigned to Frederic.

Comment 9 Junqi Zhao 2020-07-22 02:29:03 UTC
since 4.6 bug 1859164 is fixed, set the Target Release to 4.5.z

Comment 10 Frederic Branczyk 2020-07-23 08:05:18 UTC
Increasing to high severity as we are unable to observe apiserver requests metrics which is one of our most important signals.

Comment 14 Junqi Zhao 2020-07-24 04:27:22 UTC
Tested with 4.5.0-0.nightly-2020-07-23-201307, issue is fixed
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-23-201307   True        False         83m     Cluster version is 4.5.0-0.nightly-2020-07-23-201307

# oc -n openshift-monitoring get cm telemetry-config -oyaml | grep "code:apiserver_request_total:rate:sum"
    # (@openshift/openshift-team-olm) code:apiserver_request_total:rate:sum identifies average of occurences
    - '{__name__="code:apiserver_request_total:rate:sum"}'

# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "code:apiserver_request_total:rate:sum"
    "code:apiserver_request_total:rate:sum",

Comment 16 errata-xmlrpc 2020-07-30 18:56:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3028


Note You need to log in before you can comment on or make changes to this bug.