While debugging a high priority CI failure, we realized that a number of key host level IO metrics returned by cAdvisor were set to zero. container_fs_io_time_seconds_total container_fs_read_seconds_total container_fs_reads_merged_total container_fs_io_current These are zero for all containers AND the root device (which is usually special). This worked in 3.11 and is a key tool for debugging IO - this has regressed sometime in recent releases. Needs to be fixed and backported to 4.4 at minimum.
The disk stats from one of the boxes reporting 0 sh-4.4# cat /proc/diskstats 259 0 nvme0n1 3890442 1503 208168451 2536364 155312820 84262481 13280576806 152625519 0 17300601 58405818 0 0 0 0 259 1 nvme0n1p1 144 0 8802 46 14 0 40 8 0 50 6 0 0 0 0 259 2 nvme0n1p2 443 0 11953 227 1 0 1 0 0 54 13 0 0 0 0 259 3 nvme0n1p3 26 0 208 5 0 0 0 0 0 7 0 0 0 0 0 259 4 nvme0n1p4 3889753 1503 208143872 2536064 155312805 84262481 13280576765 152625510 0 17300508 58405798 0 0 0 0 253 0 dm-0 3891145 0 208140230 2457857 239575286 0 13280576765 249604868 0 17302451 252062725 0 0 0 0 259 5 nvme1n1 465396 19 3752698 378287 402540 531025 9171280 423786 0 373694 353150 0 0 0 0 259 6 nvme2n1 176 0 6634 87 1081 1380 20384 620 0 680 39 0 0 0 0 Also, we should note that the metrics for the LUKS device are returned, but not the root device. Is devicemapper messing up our metrics after the introduction of the encryption stuff?
I wasn't able to spend adequate amount of time on this sprint due to other higher priority work, I will be looking into it in next sprint.
These metrics are fundamentally to debugging. You may not move this to 4.6.
On 4.1.1 # uname -a Linux ip-10-0-136-166 4.18.0-80.1.2.el8_0.x86_64 #1 SMP Sun Apr 28 09:21:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # lsblk -fs NAME FSTYPE LABEL UUID MOUNTPOINT xvda1 └─xvda xvda2 ext4 boot 51b1d71e-c3f9-4a6a-b22e-407ff7412d06 /boot └─xvda xvda3 xfs root 77a6c49a-0403-474d-b727-80e256f7c564 /sysroot └─xvda # cat /proc/diskstats 202 0 xvda 5296 20 622167 7711 152379 60718 8599959 574473 0 53334 501001 0 0 0 0 202 1 xvda1 28 0 224 5 0 0 0 0 0 11 0 0 0 0 0 202 2 xvda2 165 0 9466 65 7 1 64 3 0 71 12 0 0 0 0 202 3 xvda3 5042 20 609413 7607 152372 60717 8599895 574469 0 53310 500976 0 0 0 0 For the Prometheus query, 'container_fs_io_time_seconds_total{id="/"}' (column 13 in /proc/diskstats), I can see the corresponding metrics. container_fs_io_time_seconds_total{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.128.87:10250",job="kubelet",node="ip-10-0-128-87.ap-south-1.compute.internal",service="kubelet"} 0.000000063 container_fs_io_time_seconds_total{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.136.166:10250",job="kubelet",node="ip-10-0-136-166.ap-south-1.compute.internal",service="kubelet"} 0.000000071 container_fs_io_time_seconds_total{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.146.253:10250",job="kubelet",node="ip-10-0-146-253.ap-south-1.compute.internal",service="kubelet"} 0.000000072 container_fs_io_time_seconds_total{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.151.178:10250",job="kubelet",node="ip-10-0-151-178.ap-south-1.compute.internal",service="kubelet"} 0.000000071 container_fs_io_time_seconds_total{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.165.81:10250",job="kubelet",node="ip-10-0-165-81.ap-south-1.compute.internal",service="kubelet"} 0.000000063 container_fs_io_time_seconds_total{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.175.194:10250",job="kubelet",node="ip-10-0-175-194.ap-south-1.compute.internal",service="kubelet"} 0.000000065 container_fs_io_time_seconds_total{device="/dev/xvda3",endpoint="https-metrics",id="/",instance="10.0.128.87:10250",job="kubelet",node="ip-10-0-128-87.ap-south-1.compute.internal",service="kubelet"} 0.000181785 container_fs_io_time_seconds_total{device="/dev/xvda3",endpoint="https-metrics",id="/",instance="10.0.136.166:10250",job="kubelet",node="ip-10-0-136-166.ap-south-1.compute.internal",service="kubelet"} 0.000049739 container_fs_io_time_seconds_total{device="/dev/xvda3",endpoint="https-metrics",id="/",instance="10.0.146.253:10250",job="kubelet",node="ip-10-0-146-253.ap-south-1.compute.internal",service="kubelet"} 0.00003107 container_fs_io_time_seconds_total{device="/dev/xvda3",endpoint="https-metrics",id="/",instance="10.0.151.178:10250",job="kubelet",node="ip-10-0-151-178.ap-south-1.compute.internal",service="kubelet"} 0.000179717 container_fs_io_time_seconds_total{device="/dev/xvda3",endpoint="https-metrics",id="/",instance="10.0.165.81:10250",job="kubelet",node="ip-10-0-165-81.ap-south-1.compute.internal",service="kubelet"} 0.000171187 container_fs_io_time_seconds_total{device="/dev/xvda3",endpoint="https-metrics",id="/",instance="10.0.175.194:10250",job="kubelet",node="ip-10-0-175-194.ap-south-1.compute.internal",service="kubelet"} 0.000035655 While on 4.4.6, # uname -a Linux debug-plxnc-w-a-br6nb.c.openshift-gce-devel.internal 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Wed Feb 26 03:08:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # lsblk -fs NAME FSTYPE LABEL UUID MOUNTPOINT sda1 ext4 boot fd9689d9-fbcc-488a-a70e-145fddf245e7 /boot └─sda sda2 vfat EFI-SYSTEM 4FC5-6A46 /boot/efi └─sda sda3 └─sda coreos-luks-root-nocrypt xfs root d2d725c8-d0aa-4f58-a6fb-811d90240135 /sysroot └─sda4 crypto_LUKS crypt_rootfs 00000000-0000-4000-a000-000000000002 └─sda # cat /proc/diskstats 8 0 sda 7512 11 827015 7436 321336 33931 10380796 582825 0 186009 438510 0 0 0 0 8 1 sda1 144 0 8802 84 9 5 40 9 0 97 24 0 0 0 0 8 2 sda2 443 0 11953 804 1 0 1 0 0 89 616 0 0 0 0 8 3 sda3 26 0 208 17 0 0 0 0 0 14 8 0 0 0 0 8 4 sda4 6823 11 802436 6487 255736 33926 10380755 573503 0 168713 436019 0 0 0 0 253 0 dm-0 6816 0 798794 6565 355249 0 10941122 819340 0 186266 825905 0 0 0 0 # ls -l /dev/mapper/coreos-luks-root-nocrypt lrwxrwxrwx. 1 root root 7 Jun 4 06:43 /dev/mapper/coreos-luks-root-nocrypt -> ../dm-0 Prometheus query, container_fs_io_time_seconds_total{id="/"}, looks like, container_fs_io_time_seconds_total{device="/dev/mapper/coreos-luks-root-nocrypt",endpoint="https-metrics",id="/",instance="10.0.0.3:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-m-2.c.openshift-gce-devel.internal",service="kubelet"} 0 container_fs_io_time_seconds_total{device="/dev/mapper/coreos-luks-root-nocrypt",endpoint="https-metrics",id="/",instance="10.0.0.4:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-m-1.c.openshift-gce-devel.internal",service="kubelet"} 0 container_fs_io_time_seconds_total{device="/dev/mapper/coreos-luks-root-nocrypt",endpoint="https-metrics",id="/",instance="10.0.0.5:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-m-0.c.openshift-gce-devel.internal",service="kubelet"} 0 container_fs_io_time_seconds_total{device="/dev/mapper/coreos-luks-root-nocrypt",endpoint="https-metrics",id="/",instance="10.0.32.2:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-w-c-ntcdh.c.openshift-gce-devel.internal",service="kubelet"} 0 container_fs_io_time_seconds_total{device="/dev/mapper/coreos-luks-root-nocrypt",endpoint="https-metrics",id="/",instance="10.0.32.3:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-w-a-br6nb.c.openshift-gce-devel.internal",service="kubelet"} 0 container_fs_io_time_seconds_total{device="/dev/mapper/coreos-luks-root-nocrypt",endpoint="https-metrics",id="/",instance="10.0.32.4:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-w-b-w6tbg.c.openshift-gce-devel.internal",service="kubelet"} 0 container_fs_io_time_seconds_total{device="/dev/sda1",endpoint="https-metrics",id="/",instance="10.0.0.3:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-m-2.c.openshift-gce-devel.internal",service="kubelet"} 0.000000131 container_fs_io_time_seconds_total{device="/dev/sda1",endpoint="https-metrics",id="/",instance="10.0.0.4:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-m-1.c.openshift-gce-devel.internal",service="kubelet"} 0.000000079 container_fs_io_time_seconds_total{device="/dev/sda1",endpoint="https-metrics",id="/",instance="10.0.0.5:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-m-0.c.openshift-gce-devel.internal",service="kubelet"} 0.000000103 container_fs_io_time_seconds_total{device="/dev/sda1",endpoint="https-metrics",id="/",instance="10.0.32.2:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-w-c-ntcdh.c.openshift-gce-devel.internal",service="kubelet"} 0.000000106 container_fs_io_time_seconds_total{device="/dev/sda1",endpoint="https-metrics",id="/",instance="10.0.32.3:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-w-a-br6nb.c.openshift-gce-devel.internal",service="kubelet"} 0.000000097 container_fs_io_time_seconds_total{device="/dev/sda1",endpoint="https-metrics",id="/",instance="10.0.32.4:10250",job="kubelet",metrics_path="/metrics/cadvisor",node="debug-plxnc-w-b-w6tbg.c.openshift-gce-devel.internal",service="kubelet"} 0.000000101 So in case of 4.1.1 (root device /dev/xvda3) we get the metrics but in case of 4.4.6 (root device /dev/mapper/coreos-luks-root-nocrypt) we don't get corresponding metrics. Looking into it.
For the record and reference, adding the mapping between a metric and corresponding column number in /proc/diskstats # cat /proc/diskstats 8 0 sda 7512 11 827015 7436 321336 33931 10380796 582825 0 186009 438510 0 0 0 0 8 1 sda1 144 0 8802 84 9 5 40 9 0 97 24 0 0 0 0 8 2 sda2 443 0 11953 804 1 0 1 0 0 89 616 0 0 0 0 8 3 sda3 26 0 208 17 0 0 0 0 0 14 8 0 0 0 0 8 4 sda4 6823 11 802436 6487 255736 33926 10380755 573503 0 168713 436019 0 0 0 0 253 0 dm-0 6816 0 798794 6565 355249 0 10941122 819340 0 186266 825905 0 0 0 0 container_fs_io_time_seconds_total - 13th column container_fs_read_seconds_total - 7th column container_fs_reads_merged_total - 5nd column container_fs_io_current - 12th column meaning of individual columns 1 - major number 2 - minor mumber 3 - device name 4 - reads completed successfully 5 - reads merged 6 - sectors read 7 - time spent reading (ms) 8 - writes completed 9 - writes merged 10 - sectors written 11 - time spent writing (ms)
Still needs bump in Origin
https://github.com/openshift/google-cadvisor/pull/10
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196