Created attachment 1849089 [details] Screen shots of my webconsole Description of problem: From the Console Observe - Dashboard - Kubernetes/Compute Resources/Cluster I/O metrics show as no datapoints found Version-Release number of selected component (if applicable): Client Version: 4.10.0-0.nightly-s390x-2022-01-04-235928 Server Version: 4.10.0-0.nightly-s390x-2022-01-04-235928 Kubernetes Version: v1.22.1+6859754 Though the query is different the same issue exists on 4.9 as I went back to see if there was a difference How reproducible: I have i/o intensive workload running to ensure I had some reads and writes Go To Observe - Dashboard - Kubernetes/Compute Resources/Cluster Page down to the IOPs stats Steps to Reproduce: 1. 2. 3. Actual results: I would expect values Expected results: Query returning no datapoints found Additional info: I will attach my screen shots of the grafana panels and the Dashboard queries that are failing. Please let me know if you need me to gather specific data.
It happens in other hardwares as well(e.g amd64).
container_fs_.* metrics doesn't have "container" label after https://github.com/openshift/cluster-monitoring-operator/pull/1402.
(In reply to Arunprasad Rajkumar from comment #2) > container_fs_.* metrics doesn't have "container" label after > https://github.com/openshift/cluster-monitoring-operator/pull/1402. also don't have cluster label count(container_fs_reads_total) by (job,metrics_path,container,cluster) {job="kubelet", metrics_path="/metrics/cadvisor"} 346 count(container_fs_writes_total) by (job,metrics_path,container,cluster) {job="kubelet", metrics_path="/metrics/cadvisor"} 346 count(container_fs_reads_bytes_total) by (job,metrics_path,container,cluster) {job="kubelet", metrics_path="/metrics/cadvisor"} 310 count(container_fs_writes_bytes_total) by (job,metrics_path,container,cluster) {job="kubelet", metrics_path="/metrics/cadvisor"} 310
(In reply to Junqi Zhao from comment #3) > (In reply to Arunprasad Rajkumar from comment #2) > > container_fs_.* metrics doesn't have "container" label after > > https://github.com/openshift/cluster-monitoring-operator/pull/1402. > > also don't have cluster label > count(container_fs_reads_total) by (job,metrics_path,container,cluster) > {job="kubelet", metrics_path="/metrics/cadvisor"} 346 > @juzhao `cluster=""` is totally fine when the series is missing `cluster` label.
(In reply to Arunprasad Rajkumar from comment #5) > (In reply to Junqi Zhao from comment #3) > > (In reply to Arunprasad Rajkumar from comment #2) > > > container_fs_.* metrics doesn't have "container" label after > > > https://github.com/openshift/cluster-monitoring-operator/pull/1402. > > > > also don't have cluster label > > count(container_fs_reads_total) by (job,metrics_path,container,cluster) > > {job="kubelet", metrics_path="/metrics/cadvisor"} 346 > > > > @juzhao `cluster=""` is totally fine when the series is missing > `cluster` label. yes, fine with `cluster=""` although there is not cluster label
Since cluster-bot is using 4.10 by default it has issue in bringing up cluster as it is using 0.53.1 which doesn't have automountServiceAccount fix which is needed for recent jsonnet update. See https://github.com/openshift/cluster-monitoring-operator/pull/1556#issuecomment-1048750668 Need to wait till https://github.com/openshift/cluster-monitoring-operator/pull/1554 is merged and available in payload to verify
issue on "Kubernetes / Compute Resources / Cluster" and "Kubernetes / Compute Resources / Namespace (Pods)" dashboard is fixed, but for "Kubernetes / Compute Resources / Pod" dashboards, Storage IO - Distribution section, Current Storage IO, value for Container is "_", see from the picture. Current Storage IO prometheus expr: sum by(container) (rate(container_fs_reads_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m])) sum by(container) (rate(container_fs_writes_total{job="kubelet", metrics_path="/metrics/cadvisor",device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m])) sum by(container) (rate(container_fs_reads_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]) + rate(container_fs_writes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m])) sum by(container) (rate(container_fs_reads_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m])) sum by(container) (rate(container_fs_writes_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m])) sum by(container) (rate(container_fs_reads_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m]) + rate(container_fs_writes_bytes_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev.+)|mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|dasd.+", id!="", cluster="", namespace="openshift-monitoring", pod="node-exporter-5rhmf"}[5m])) there is not container label for container_fs_reads_total{container!=""} No datapoints found. container_fs_reads_total{container!=""} No datapoints found. container_fs_reads_bytes_total{container!=""} No datapoints found. container_fs_writes_bytes_total{container!=""} No datapoints found.
tested with 4.11.0-0.nightly-2022-06-23-092832, "Kubernetes / Compute Resources / Pod" dashboard, it changes to "Storage IO - Distribution(Pod - Read & Writes)", see the picture, and no issue for the graph, other dashboards are normal too, see Comment 15
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069