Bug 1673787
Summary: | Grafana DISK IO metrics are empty due to not matching disk name patterns | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Daein Park <dapark> |
Component: | Monitoring | Assignee: | Sergiusz Urbaniak <surbania> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.11.0 | CC: | fbranczy, mloibl, surbania |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:42:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1678645 | ||
Bug Blocks: |
Description
Daein Park
2019-02-08 05:26:18 UTC
Unfortunately due to how the dependencies work and evolved, it's not trivial to backport this. We're likely to only ship this fix in 4.0, not 3.11. @ Frederic It seems we missed one device I checked in 3.11 env, and found it has device="dm-0", maybe there have "dm-1", "dm-2" devcice eg: $ ls -l /dev/dm* brw-rw----. 1 root disk 253, 0 Mar 1 08:11 /dev/dm-0 brw-rw----. 1 root disk 253, 1 Mar 1 08:11 /dev/dm-1 brw-rw----. 1 root disk 253, 2 Mar 1 08:11 /dev/dm-2 node_disk_io_time_ms in 3.11 also detects this device, eg: node_disk_io_time_ms{device="dm-0",endpoint="https",instance="10.0.77.93:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-9znkd",service="node-exporter"} 933001 node_disk_io_time_ms{device="vda",endpoint="https",instance="10.0.76.252:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-k5vxn",service="node-exporter"} 59668 But prometheus rules does not contain this kind of device, eg: record: node:node_disk_saturation:avg_irate expr: avg by(node) (irate(node_disk_io_time_weighted_seconds_total{device=~"nvme.+|rbd.+|sd.+|vd.+|xvd.+",job="node-exporter"}[1m]) / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:) Shall we add this device to prometheus rules, same question for https://bugzilla.redhat.com/show_bug.cgi?id=1680517#c3 Reference: https://superuser.com/questions/131519/what-is-this-dm-0-device Yes let's add them. Given that these are disk io stats, I think we can safely assume that these are only storage devices (my understanding is devicemapper devices can otherwise be pretty much anything). We'll make sure to adapt. (In reply to Frederic Branczyk from comment #5) > Yes let's add them. Given that these are disk io stats, I think we can > safely assume that these are only storage devices (my understanding is > devicemapper devices can otherwise be pretty much anything). We'll make sure > to adapt. Thanks, we also need to back port to 3.11, since 3.11 has the same issue, already mentioned in Bug 1680517 device names are correct now, also include devicemapper devices device=~"nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+" payload: 4.0.0-0.nightly-2019-03-06-074438 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |