Description of problem: ----------------------- Starting with 4.6 (4.7 & 4.8 too), there are no more quantile numbers anymore with container_runtime_crio_operations_latency_microseconds metric. There are only container_runtime_crio_operations_latency_microseconds_sum and container_runtime_crio_operations_latency_microseconds_count. There are some customers monitor these quantile metrics on 4.5. With just _sum & _count, it's not quite possible to calculate quantile info at all. Version-Release number of selected component (if applicable): ------------------------------------------------------------- OCP 4.6 (cri-o 1.19), 4.7 (cri-o 1.20), 4.8 (cri-o 1.21) How reproducible: ----------------- Always On 4.5: ------- The quantile info are still there... [quicklab@upi-0 ~]$ oc version Client Version: 4.5.41 Server Version: 4.5.41 Kubernetes Version: v1.18.3+d8ef5ad [quicklab@upi-0 ~]$ NODEIP=$(oc get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}') [quicklab@upi-0 ~]$ oc get --raw /metrics --server http://${NODEIP}:9537 | grep "container_runtime_crio_operations_latency" # HELP container_runtime_crio_operations_latency_microseconds Latency in microseconds of CRI-O operations. Broken down by operation type. # TYPE container_runtime_crio_operations_latency_microseconds summary container_runtime_crio_operations_latency_microseconds{operation_type="Attach",quantile="0.5"} NaN container_runtime_crio_operations_latency_microseconds{operation_type="Attach",quantile="0.9"} NaN container_runtime_crio_operations_latency_microseconds{operation_type="Attach",quantile="0.99"} NaN container_runtime_crio_operations_latency_microseconds_sum{operation_type="Attach"} 19510 container_runtime_crio_operations_latency_microseconds_count{operation_type="Attach"} 17 container_runtime_crio_operations_latency_microseconds{operation_type="ContainerStatus",quantile="0.5"} 32 container_runtime_crio_operations_latency_microseconds{operation_type="ContainerStatus",quantile="0.9"} 54 container_runtime_crio_operations_latency_microseconds{operation_type="ContainerStatus",quantile="0.99"} 92 . . . On 4.8 (same for 4.6, 4.7): --------------------------- There are no quantile info but just _total_sum & _total_count metrics... [quicklab@upi-0 ~]$ oc version Client Version: 4.8.5 Server Version: 4.8.5 Kubernetes Version: v1.21.1+9807387 [quicklab@upi-0 ~]$ NODEIP=$(oc get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}') [quicklab@upi-0 ~]$ oc get --raw /metrics --server http://${NODEIP}:9537 | grep "container_runtime_crio_operations_latency" | grep -c quantile 0 Expected results: ----------------- Quantile numbers should be provided like 4.5.
Sascha can you PTAL
Confirmed, fix is incoming in https://github.com/cri-o/cri-o/pull/5258. I'll backport the fix if merged.
Upstream PR merged, cherry-picking now.
Upstream PR got merged into release-1.21 (https://github.com/cri-o/cri-o/pull/5266) Means that the next package build of CRI-O should contain the fix in 4.8.
sh-4.4# oc get --raw /metrics --server http://${NODEIP}:9537 | grep "container_runtime_crio_operations_atency" | grep -c quantile 57 Verified to be fixed on $ oc get clusterversions NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2022-02-09-031830 True False 43m Cluster version is 4.8.0-0.nightly-2022-02-09-031830
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.31 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0484