Bug 1680517 - node:node_disk_utilisation:avg_irate and node:node_disk_saturation:avg_irate rules are not work for Openstack
Summary: node:node_disk_utilisation:avg_irate and node:node_disk_saturation:avg_irate ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 3.11.z
Assignee: Christian Heidenreich
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-25 09:24 UTC by Junqi Zhao
Modified: 2020-02-11 10:33 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-11 10:33:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
"No data points" for "Disk IO Utilisation" and "Disk IO Saturation" (108.49 KB, image/png)
2019-02-25 09:24 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2019-02-25 09:24:55 UTC
Created attachment 1538365 [details]
"No data points" for "Disk IO Utilisation" and "Disk IO Saturation"

Description of problem:
Tested with ose-cluster-monitoring-operator/images/v3.11.88-1 on openstack
In grafana UI "K8s / USE Method / Node "page,  "No data points" for "Disk IO Utilisation" and "Disk IO Saturation"

Checked "Disk IO Utilisation" used node:node_disk_utilisation:avg_irate rules and "Disk IO Saturation" used node:node_disk_saturation:avg_irate rules, these rules are OK for AWS/GCE, but not for Openstack, since device name is device="dm-0" and device="vda/c/d"(vd.+)
********************************************************************************
Disk IO Utilisation
https://grafana-openshift-monitoring.apps.0225-4gx.qe.rhcloud.com/api/datasources/proxy/1/api/v1/query_range?query=node:node_disk_utilisation:avg_irate{node="****"}&start=1551069510&end=1551073140&step=30
********************************************************************************
Disk IO Saturation
https://grafana-openshift-monitoring.apps.0225-4gx.qe.rhcloud.com/api/datasources/proxy/1/api/v1/query_range?query=node:node_disk_saturation:avg_irate{node="****"}&start=1551069510&end=1551073140&step=30
********************************************************************************
record: node:node_disk_utilisation:avg_irate
expr: avg
  by(node) (irate(node_disk_io_time_ms{device=~"(sd|xvd|nvme).+",job="node-exporter"}[1m])
  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)

record: node:node_disk_saturation:avg_irate
expr: avg
  by(node) (irate(node_disk_io_time_weighted{device=~"(sd|xvd|nvme).+",job="node-exporter"}[1m])
  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)
********************************************************************************
node_disk_io_time_ms{job="node-exporter"} in prometheus UI

Element	Value
node_disk_io_time_ms{device="dm-0",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	954685.0000000001
node_disk_io_time_ms{device="dm-0",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	474357
node_disk_io_time_ms{device="dm-0",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	500224
node_disk_io_time_ms{device="vda",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	126423
node_disk_io_time_ms{device="vda",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	52844
node_disk_io_time_ms{device="vda",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	59845
node_disk_io_time_ms{device="vdb",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	2988
node_disk_io_time_ms{device="vdb",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	5206
node_disk_io_time_ms{device="vdc",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	257
node_disk_io_time_ms{device="vdc",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	173.00000000000003
node_disk_io_time_ms{device="vdd",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	191
**************************************************************
node_disk_io_time_weighted{job="node-exporter"} in prometheus UI
Element	Value
node_disk_io_time_weighted{device="dm-0",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	3873.425
node_disk_io_time_weighted{device="dm-0",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	2762.52
node_disk_io_time_weighted{device="dm-0",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	3658.87
node_disk_io_time_weighted{device="vda",endpoint="https",instance="10.0.76.148:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-6jvjv",service="node-exporter"}	1719.608
node_disk_io_time_weighted{device="vda",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	1184.056
node_disk_io_time_weighted{device="vda",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	1273.034
node_disk_io_time_weighted{device="vdb",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	3.911
node_disk_io_time_weighted{device="vdb",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	7.839
node_disk_io_time_weighted{device="vdc",endpoint="https",instance="10.0.76.15:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-4jf4k",service="node-exporter"}	0.293
node_disk_io_time_weighted{device="vdc",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	0.193
node_disk_io_time_weighted{device="vdd",endpoint="https",instance="10.0.76.218:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-mfg5l",service="node-exporter"}	0.247
Version-Release number of selected component (if applicable):
ose-cluster-monitoring-operator/images/v3.11.88-1
**************************************************************
How reproducible:
IAAS related

Steps to Reproduce:
1. Check grafana UI "K8s / USE Method / Node "page
2.
3.

Actual results:
"No data points" for "Disk IO Utilisation" and "Disk IO Saturation"

Expected results:
Should show metrics diagram

Additional info:
Should backport https://github.com/openshift/cluster-monitoring-operator/blob/e536f4fea32bf02944f583d57eb29951e8c70481/assets/prometheus-k8s/rules.yaml
to 3.11

Comment 1 Junqi Zhao 2019-02-25 09:28:51 UTC
Should also consider device like{device="dm-0"}

Comment 5 Junqi Zhao 2019-02-27 02:28:11 UTC
PR is not merged, move back to assign

Comment 6 Junqi Zhao 2019-03-04 02:54:57 UTC
We need to merge the PR, since this bug is attached to errata
BTW: We should also consider device like{device="dm-0"}

Comment 7 Junqi Zhao 2019-03-04 05:31:42 UTC
Sergiusz
please see Comment 6


Note You need to log in before you can comment on or make changes to this bug.