Bug 1680517

Summary: node:node_disk_utilisation:avg_irate and node:node_disk_saturation:avg_irate rules are not work for Openstack
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Christian Heidenreich <cvogel>
Status: CLOSED NEXTRELEASE QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: low    
Version: 3.11.0CC: mloibl, surbania
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-11 10:33:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
"No data points" for "Disk IO Utilisation" and "Disk IO Saturation" none

Description Junqi Zhao 2019-02-25 09:24:55 UTC
Created attachment 1538365 [details]
"No data points" for "Disk IO Utilisation" and "Disk IO Saturation"

Description of problem:
Tested with ose-cluster-monitoring-operator/images/v3.11.88-1 on openstack
In grafana UI "K8s / USE Method / Node "page,  "No data points" for "Disk IO Utilisation" and "Disk IO Saturation"

Checked "Disk IO Utilisation" used node:node_disk_utilisation:avg_irate rules and "Disk IO Saturation" used node:node_disk_saturation:avg_irate rules, these rules are OK for AWS/GCE, but not for Openstack, since device name is device="dm-0" and device="vda/c/d"(vd.+)
********************************************************************************
Disk IO Utilisation
https://grafana-openshift-monitoring.apps.0225-4gx.qe.rhcloud.com/api/datasources/proxy/1/api/v1/query_range?query=node:node_disk_utilisation:avg_irate{node="****"}&start=1551069510&end=1551073140&step=30
********************************************************************************
Disk IO Saturation
https://grafana-openshift-monitoring.apps.0225-4gx.qe.rhcloud.com/api/datasources/proxy/1/api/v1/query_range?query=node:node_disk_saturation:avg_irate{node="****"}&start=1551069510&end=1551073140&step=30
********************************************************************************
record: node:node_disk_utilisation:avg_irate
expr: avg
  by(node) (irate(node_disk_io_time_ms{device=~"(sd|xvd|nvme).+",job="node-exporter"}[1m])
  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)

record: node:node_disk_saturation:avg_irate
expr: avg
  by(node) (irate(node_disk_io_time_weighted{device=~"(sd|xvd|nvme).+",job="node-exporter"}[1m])
  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)
********************************************************************************
node_disk_io_time_ms{job="node-exporter"} in prometheus UI

Element	Value
node_disk_io_time_ms{device="dm-0",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	954685.0000000001
node_disk_io_time_ms{device="dm-0",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	474357
node_disk_io_time_ms{device="dm-0",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	500224
node_disk_io_time_ms{device="vda",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	126423
node_disk_io_time_ms{device="vda",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	52844
node_disk_io_time_ms{device="vda",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	59845
node_disk_io_time_ms{device="vdb",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	2988
node_disk_io_time_ms{device="vdb",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	5206
node_disk_io_time_ms{device="vdc",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	257
node_disk_io_time_ms{device="vdc",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	173.00000000000003
node_disk_io_time_ms{device="vdd",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	191
**************************************************************
node_disk_io_time_weighted{job="node-exporter"} in prometheus UI
Element	Value
node_disk_io_time_weighted{device="dm-0",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	3873.425
node_disk_io_time_weighted{device="dm-0",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	2762.52
node_disk_io_time_weighted{device="dm-0",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	3658.87
node_disk_io_time_weighted{device="vda",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	1719.608
node_disk_io_time_weighted{device="vda",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	1184.056
node_disk_io_time_weighted{device="vda",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	1273.034
node_disk_io_time_weighted{device="vdb",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	3.911
node_disk_io_time_weighted{device="vdb",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	7.839
node_disk_io_time_weighted{device="vdc",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	0.293
node_disk_io_time_weighted{device="vdc",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	0.193
node_disk_io_time_weighted{device="vdd",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	0.247
Version-Release number of selected component (if applicable):
ose-cluster-monitoring-operator/images/v3.11.88-1
**************************************************************
How reproducible:
IAAS related

Steps to Reproduce:
1. Check grafana UI "K8s / USE Method / Node "page
2.
3.

Actual results:
"No data points" for "Disk IO Utilisation" and "Disk IO Saturation"

Expected results:
Should show metrics diagram

Additional info:
Should backport https://github.com/openshift/cluster-monitoring-operator/blob/e536f4fea32bf02944f583d57eb29951e8c70481/assets/prometheus-k8s/rules.yaml
to 3.11

Comment 1 Junqi Zhao 2019-02-25 09:28:51 UTC
Should also consider device like{device="dm-0"}

Comment 5 Junqi Zhao 2019-02-27 02:28:11 UTC
PR is not merged, move back to assign

Comment 6 Junqi Zhao 2019-03-04 02:54:57 UTC
We need to merge the PR, since this bug is attached to errata
BTW: We should also consider device like{device="dm-0"}

Comment 7 Junqi Zhao 2019-03-04 05:31:42 UTC
Sergiusz
please see Comment 6