Bug 1997926 - container_runtime_crio_operations_latency_microseconds does not have quantile anymore
Summary: container_runtime_crio_operations_latency_microseconds does not have quantile...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.8.z
Assignee: Sascha Grunert
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On:
Blocks: 2010831 2010841
TreeView+ depends on / blocked
 
Reported: 2021-08-26 05:08 UTC by Alan Chan
Modified: 2022-02-16 06:52 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2010831 2010841 (view as bug list)
Environment:
Last Closed: 2022-02-16 06:51:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 5258 0 None None None 2021-08-27 09:33:02 UTC
Github cri-o cri-o pull 5259 0 None None None 2021-08-30 07:48:59 UTC
Github cri-o cri-o pull 5264 0 None None None 2021-08-30 07:48:59 UTC
Github cri-o cri-o pull 5265 0 None None None 2021-08-30 07:48:59 UTC
Github cri-o cri-o pull 5266 0 None None None 2021-08-30 07:48:59 UTC
Red Hat Product Errata RHBA-2022:0484 0 None None None 2022-02-16 06:51:59 UTC

Description Alan Chan 2021-08-26 05:08:37 UTC
Description of problem:
-----------------------

Starting with 4.6 (4.7 & 4.8 too), there are no more quantile numbers anymore with container_runtime_crio_operations_latency_microseconds metric.

There are only container_runtime_crio_operations_latency_microseconds_sum and container_runtime_crio_operations_latency_microseconds_count.

There are some customers monitor these quantile metrics on 4.5. With just _sum & _count, it's not quite possible to calculate quantile info at all.


Version-Release number of selected component (if applicable):
-------------------------------------------------------------

OCP 4.6 (cri-o 1.19), 4.7 (cri-o 1.20), 4.8 (cri-o 1.21)


How reproducible:
-----------------

Always


On 4.5:
-------

The quantile info are still there...

[quicklab@upi-0 ~]$ oc version
Client Version: 4.5.41
Server Version: 4.5.41
Kubernetes Version: v1.18.3+d8ef5ad

[quicklab@upi-0 ~]$ NODEIP=$(oc get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
[quicklab@upi-0 ~]$ oc get --raw /metrics --server http://${NODEIP}:9537 | grep "container_runtime_crio_operations_latency"
# HELP container_runtime_crio_operations_latency_microseconds Latency in microseconds of CRI-O operations. Broken down by operation type.
# TYPE container_runtime_crio_operations_latency_microseconds summary
container_runtime_crio_operations_latency_microseconds{operation_type="Attach",quantile="0.5"} NaN
container_runtime_crio_operations_latency_microseconds{operation_type="Attach",quantile="0.9"} NaN
container_runtime_crio_operations_latency_microseconds{operation_type="Attach",quantile="0.99"} NaN
container_runtime_crio_operations_latency_microseconds_sum{operation_type="Attach"} 19510
container_runtime_crio_operations_latency_microseconds_count{operation_type="Attach"} 17
container_runtime_crio_operations_latency_microseconds{operation_type="ContainerStatus",quantile="0.5"} 32
container_runtime_crio_operations_latency_microseconds{operation_type="ContainerStatus",quantile="0.9"} 54
container_runtime_crio_operations_latency_microseconds{operation_type="ContainerStatus",quantile="0.99"} 92
.
.
.


On 4.8 (same for 4.6, 4.7):
---------------------------

There are no quantile info but just _total_sum & _total_count metrics...

[quicklab@upi-0 ~]$ oc version
Client Version: 4.8.5
Server Version: 4.8.5
Kubernetes Version: v1.21.1+9807387

[quicklab@upi-0 ~]$ NODEIP=$(oc get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
[quicklab@upi-0 ~]$ oc get --raw /metrics --server http://${NODEIP}:9537 | grep "container_runtime_crio_operations_latency" | grep -c quantile
0


Expected results:
-----------------

Quantile numbers should be provided like 4.5.

Comment 1 Peter Hunt 2021-08-26 14:27:44 UTC
Sascha can you PTAL

Comment 2 Sascha Grunert 2021-08-27 09:32:43 UTC
Confirmed, fix is incoming in https://github.com/cri-o/cri-o/pull/5258. I'll backport the fix if merged.

Comment 3 Sascha Grunert 2021-08-30 07:49:00 UTC
Upstream PR merged, cherry-picking now.

Comment 4 Sascha Grunert 2021-10-07 07:27:41 UTC
Upstream PR got merged into release-1.21 (https://github.com/cri-o/cri-o/pull/5266) Means that the next package build of CRI-O should contain the fix in 4.8.

Comment 7 Weinan Liu 2022-02-09 11:32:20 UTC
sh-4.4#  oc get --raw /metrics --server http://${NODEIP}:9537 | grep "container_runtime_crio_operations_atency" | grep -c quantile
57

Verified to be fixed on $ oc get clusterversions
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2022-02-09-031830   True        False         43m     Cluster version is 4.8.0-0.nightly-2022-02-09-031830

Comment 9 errata-xmlrpc 2022-02-16 06:51:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.31 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0484


Note You need to log in before you can comment on or make changes to this bug.