Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2037513

Summary: I/O metrics from the Kubernetes/Compute Resources/Cluster Dashboard show as no datapoints found
Product: OpenShift Container Platform Reporter: jhusta <jhusta>
Component: MonitoringAssignee: Jayapriya Pai <janantha>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact: Brian Burt <bburt>
Priority: medium    
Version: 4.10CC: alchan, amuller, anpicker, aos-bugs, bburt, cruhm, fleber, Holger.Wolf, hongyli, jfajersk, juzhao, kchang, krmoser, spasquie, stwalter, wweber
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Before this update, Dashboards which contain queries using container label in container_fs* metrices resulted in no data points since container label was dropped in OCP due to high cardinality. With this update it resolves the issue. As part of this change panels named Storage IO - Distribution and Storage IO - Distribution(Containers) were also dropped since Prometheus doesn't collect the per-container fs metrics
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:41:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2009709    
Attachments:
Description Flags
Screen shots of my webconsole none

Description jhusta 2022-01-05 19:20:04 UTC
Created attachment 1849089 [details]
Screen shots of my webconsole

Description of problem:
From the Console
Observe - Dashboard - Kubernetes/Compute Resources/Cluster I/O  metrics show as no datapoints found 

Version-Release number of selected component (if applicable):
Client Version: 4.10.0-0.nightly-s390x-2022-01-04-235928
Server Version: 4.10.0-0.nightly-s390x-2022-01-04-235928
Kubernetes Version: v1.22.1+6859754


Though the query is different the same issue exists on 4.9 as I went back to see if there was a difference


How reproducible:
I have i/o intensive workload running to ensure I had some reads and writes
Go To Observe - Dashboard - Kubernetes/Compute Resources/Cluster
Page down to the IOPs stats


Steps to Reproduce:
1.
2.
3.

Actual results:
I would expect values

Expected results:
Query returning no datapoints found



Additional info:
I will attach my screen shots of the grafana panels and the Dashboard queries that are failing.

Please let me know if you need me to gather specific data.

Comment 1 Arunprasad Rajkumar 2022-01-06 06:00:24 UTC
It happens in other hardwares as well(e.g amd64).

Comment 2 Arunprasad Rajkumar 2022-01-06 06:47:32 UTC
container_fs_.* metrics doesn't have "container" label after https://github.com/openshift/cluster-monitoring-operator/pull/1402.

Comment 3 Junqi Zhao 2022-01-06 07:09:43 UTC
(In reply to Arunprasad Rajkumar from comment #2)
> container_fs_.* metrics doesn't have "container" label after
> https://github.com/openshift/cluster-monitoring-operator/pull/1402.

also don't have cluster label
count(container_fs_reads_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"} 346

count(container_fs_writes_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"}  346

count(container_fs_reads_bytes_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"}  310

count(container_fs_writes_bytes_total) by (job,metrics_path,container,cluster)
{job="kubelet", metrics_path="/metrics/cadvisor"}  310

Comment 5 Arunprasad Rajkumar 2022-01-06 14:26:43 UTC
(In reply to Junqi Zhao from comment #3)
> (In reply to Arunprasad Rajkumar from comment #2)
> > container_fs_.* metrics doesn't have "container" label after
> > https://github.com/openshift/cluster-monitoring-operator/pull/1402.
> 
> also don't have cluster label
> count(container_fs_reads_total) by (job,metrics_path,container,cluster)
> {job="kubelet", metrics_path="/metrics/cadvisor"} 346
> 

@juzhao `cluster=""` is totally fine when the series is missing `cluster` label.

Comment 6 Junqi Zhao 2022-01-07 02:24:01 UTC
(In reply to Arunprasad Rajkumar from comment #5)
> (In reply to Junqi Zhao from comment #3)
> > (In reply to Arunprasad Rajkumar from comment #2)
> > > container_fs_.* metrics doesn't have "container" label after
> > > https://github.com/openshift/cluster-monitoring-operator/pull/1402.
> > 
> > also don't have cluster label
> > count(container_fs_reads_total) by (job,metrics_path,container,cluster)
> > {job="kubelet", metrics_path="/metrics/cadvisor"} 346
> > 
> 
> @juzhao `cluster=""` is totally fine when the series is missing
> `cluster` label.

yes, fine with `cluster=""` although there is not cluster label

Comment 13 Jayapriya Pai 2022-02-24 04:08:12 UTC
Since cluster-bot is using 4.10 by default it has issue in bringing up cluster as it is using 0.53.1 which doesn't have automountServiceAccount fix which is needed for recent jsonnet update.
See https://github.com/openshift/cluster-monitoring-operator/pull/1556#issuecomment-1048750668


Need to wait till https://github.com/openshift/cluster-monitoring-operator/pull/1554 is merged and available in payload to verify

Comment 15 Junqi Zhao 2022-05-30 09:46:58 UTC
issue on "Kubernetes / Compute Resources / Cluster" and "Kubernetes / Compute Resources / Namespace (Pods)" dashboard is fixed, but for "Kubernetes / Compute Resources / Pod" dashboards, Storage IO - Distribution section, Current Storage IO, value for Container is "_",  see from the picture.

Current Storage IO prometheus expr:
sum by(container) (rate(container_fs_reads_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_writes_total{job="kubelet", metrics_path="/metrics/cadvisor",device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_reads_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]) + rate(container_fs_writes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_reads_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_writes_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))
sum by(container) (rate(container_fs_reads_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]) + rate(container_fs_writes_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]))

there is not container label for

container_fs_reads_total{container!=""}
No datapoints found.

container_fs_reads_total{container!=""}
No datapoints found.

container_fs_reads_bytes_total{container!=""}
No datapoints found.

container_fs_writes_bytes_total{container!=""}
No datapoints found.

Comment 21 Junqi Zhao 2022-06-24 03:16:04 UTC
tested with 4.11.0-0.nightly-2022-06-23-092832, "Kubernetes / Compute Resources / Pod" dashboard, it changes to "Storage IO - Distribution(Pod - Read & Writes)", see the picture, and no issue for the graph, other dashboards are normal too, see Comment 15

Comment 26 errata-xmlrpc 2022-08-10 10:41:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069