Bug 2037513 - I/O metrics from the Kubernetes/Compute Resources/Cluster Dashboard show as no datapoints found
Summary: I/O metrics from the Kubernetes/Compute Resources/Cluster Dashboard show as n...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.10
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 4.11.0
Assignee: Jayapriya Pai
QA Contact: Junqi Zhao
Brian Burt
URL:
Whiteboard:
Depends On:
Blocks: 2009709
TreeView+ depends on / blocked
 
Reported: 2022-01-05 19:20 UTC by jhusta
Modified: 2023-06-06 01:23 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Before this update, Dashboards which contain queries using container label in container_fs* metrices resulted in no data points since container label was dropped in OCP due to high cardinality. With this update it resolves the issue. As part of this change panels named Storage IO - Distribution and Storage IO - Distribution(Containers) were also dropped since Prometheus doesn't collect the per-container fs metrics
Clone Of:
Environment:
Last Closed: 2022-08-10 10:41:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Screen shots of my webconsole (383.72 KB, application/pdf)
2022-01-05 19:20 UTC, jhusta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes-monitoring kubernetes-mixin pull 737 0 None Merged Add containerfsSelector and update queries having container_fs* 2022-02-24 04:08:12 UTC
Github openshift cluster-monitoring-operator pull 1554 0 None open Bug 2037513: Fix dashboards having container_fs* metrices in queries 2022-02-09 10:32:43 UTC
Github openshift cluster-monitoring-operator pull 1556 0 None open Bug 2037513: Update jsonnet dependencies and prometheus-operator version 2022-02-23 07:06:21 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:41:49 UTC

Description jhusta 2022-01-05 19:20:04 UTC
Created attachment 1849089 [details]
Screen shots of my webconsole

Description of problem:
From the Console
Observe - Dashboard - Kubernetes/Compute Resources/Cluster I/O  metrics show as no datapoints found 

Version-Release number of selected component (if applicable):
Client Version: 4.10.0-0.nightly-s390x-2022-01-04-235928
Server Version: 4.10.0-0.nightly-s390x-2022-01-04-235928
Kubernetes Version: v1.22.1+6859754


Though the query is different the same issue exists on 4.9 as I went back to see if there was a difference


How reproducible:
I have i/o intensive workload running to ensure I had some reads and writes
Go To Observe - Dashboard - Kubernetes/Compute Resources/Cluster
Page down to the IOPs stats


Steps to Reproduce:
1.
2.
3.

Actual results:
I would expect values

Expected results:
Query returning no datapoints found



Additional info:
I will attach my screen shots of the grafana panels and the Dashboard queries that are failing.

Please let me know if you need me to gather specific data.

Comment 1 Arunprasad Rajkumar 2022-01-06 06:00:24 UTC
It happens in other hardwares as well(e.g amd64).

Comment 2 Arunprasad Rajkumar 2022-01-06 06:47:32 UTC
container_fs_.* metrics doesn't have "container" label after https://github.com/openshift/cluster-monitoring-operator/pull/1402.

Comment 3 Junqi Zhao 2022-01-06 07:09:43 UTC
(In reply to Arunprasad Rajkumar from comment #2)
> container_fs_.* metrics doesn't have "container" label after
> https://github.com/openshift/cluster-monitoring-operator/pull/1402.

also don't have cluster label
count(container_fs_reads_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"} 346

count(container_fs_writes_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"}  346

count(container_fs_reads_bytes_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"}  310

count(container_fs_writes_bytes_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"}  310

Comment 5 Arunprasad Rajkumar 2022-01-06 14:26:43 UTC
(In reply to Junqi Zhao from comment #3)
> (In reply to Arunprasad Rajkumar from comment #2)
> > container_fs_.* metrics doesn't have "container" label after
> > https://github.com/openshift/cluster-monitoring-operator/pull/1402.
> 
> also don't have cluster label
> count(container_fs_reads_total) by (job,metrics_path,container,cluster)
> {job="kubelet", metrics_path="/metrics/cadvisor"} 346
> 

@juzhao `cluster=""` is totally fine when the series is missing `cluster` label.

Comment 6 Junqi Zhao 2022-01-07 02:24:01 UTC
(In reply to Arunprasad Rajkumar from comment #5)
> (In reply to Junqi Zhao from comment #3)
> > (In reply to Arunprasad Rajkumar from comment #2)
> > > container_fs_.* metrics doesn't have "container" label after
> > > https://github.com/openshift/cluster-monitoring-operator/pull/1402.
> > 
> > also don't have cluster label
> > count(container_fs_reads_total) by (job,metrics_path,container,cluster)
> > {job="kubelet", metrics_path="/metrics/cadvisor"} 346
> > 
> 
> @juzhao `cluster=""` is totally fine when the series is missing
> `cluster` label.

yes, fine with `cluster=""` although there is not cluster label

Comment 13 Jayapriya Pai 2022-02-24 04:08:12 UTC
Since cluster-bot is using 4.10 by default it has issue in bringing up cluster as it is using 0.53.1 which doesn't have automountServiceAccount fix which is needed for recent jsonnet update.
See https://github.com/openshift/cluster-monitoring-operator/pull/1556#issuecomment-1048750668


Need to wait till https://github.com/openshift/cluster-monitoring-operator/pull/1554 is merged and available in payload to verify

Comment 15 Junqi Zhao 2022-05-30 09:46:58 UTC
issue on "Kubernetes / Compute Resources / Cluster" and "Kubernetes / Compute Resources / Namespace (Pods)" dashboard is fixed, but for "Kubernetes / Compute Resources / Pod" dashboards, Storage IO - Distribution section, Current Storage IO, value for Container is "_",  see from the picture.

Current Storage IO prometheus expr:
sum by(container) (rate(container_fs_reads_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_writes_total{job="kubelet", metrics_path="/metrics/cadvisor",device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_reads_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]) + rate(container_fs_writes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_reads_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_writes_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_reads_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]) + rate(container_fs_writes_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))

there is not container label for

container_fs_reads_total{container!=""}
No datapoints found.

container_fs_reads_total{container!=""}
No datapoints found.

container_fs_reads_bytes_total{container!=""}
No datapoints found.

container_fs_writes_bytes_total{container!=""}
No datapoints found.

Comment 21 Junqi Zhao 2022-06-24 03:16:04 UTC
tested with 4.11.0-0.nightly-2022-06-23-092832, "Kubernetes / Compute Resources / Pod" dashboard, it changes to "Storage IO - Distribution(Pod - Read & Writes)", see the picture, and no issue for the graph, other dashboards are normal too, see Comment 15

Comment 26 errata-xmlrpc 2022-08-10 10:41:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.