Description of problem ====================== IOPS chart on Disk Load of Brick Dashboard shows no data when it should (while copying data into particular brick to 100% utilization and then reading it all, with profiling enabled). Note that there is another IOPS chart on the Brick Dashborad, in At Glance section, which seems to report data as expected. Reported during retesting BZ 1581736. Version-Release number of selected component ============================================= tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch [root@mbukatov-usm1-server ~]# rpm -qa | grep tendrl | sort tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-api-1.6.3-3.el7rhgs.noarch tendrl-api-httpd-1.6.3-3.el7rhgs.noarch tendrl-commons-1.6.3-7.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch tendrl-node-agent-1.6.3-7.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-4.el7rhgs.noarch [root@mbukatov-usm1-gl1 ~]# rpm -qa | grep tendrl | sort tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch tendrl-commons-1.6.3-7.el7rhgs.noarch tendrl-gluster-integration-1.6.3-5.el7rhgs.noarch tendrl-node-agent-1.6.3-7.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch How reproducible ================ 100% Steps to Reproduce ================== 1. prepare gluster trusted storage pool with at least one volume 2. install WA using tendrl-ansible 3. mount the volume on dedicated client machine 4. on the client, copy large tarball in the volume while observing IOPS chart on Disk Load of Brick Dashboard for affected brick (the brick which will store the data) 5. on the client, run md5sum on the tarball you just copied into the volume, while observing IOPS chart on Disk Load of Brick Dashboard for affected brick 6. wait about half an hour Actual results ============== I see no data reported on the IOPS chart on Disk Load of Brick Dashboard at first. See screenshot 1, where you can see peak in write and then read in various charts, but the IOPS chart in disk load section reports no data. Then after the long wait (step 6), see screenshot 2, I can finally see there some data, but it's unclear what it means, as: * there is no traffic on the brick during this time * values reported here are floats smaller than one (which doesn't really make much sense for IOPS at first sight) Expected results ================ Not sure, either description of this chart should be clarified and based on it's purpose, we may need to fix it if needed.
Linking related BZ about IOPS.
Created attachment 1453564 [details] screenshot 1
Created attachment 1453565 [details] screenshot 2
Additional info =============== Inspecting timestamp from the screenshots, I can see that I waited just 16 in the step 6. <wild-guess> It's also possible that this is caused by some sync issue and bad time stamps somewhere. </wild-guess>
Maybe important detail: I actually run out of the free space on the brick.
Additional Info =============== The IOPS chart in question has labels "vdd-read" and "vdd-write". So for reference, I checked that this device really hosts the brick: ``` [root@mbukatov-usm1-gl1 ~]# lsblk /dev/vdd NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vdd 253:48 0 10G 0 disk ├─vg_beta_arbiter_3-pool_beta_arbiter_3_tmeta 252:6 0 52M 0 lvm │ └─vg_beta_arbiter_3-pool_beta_arbiter_3-tpool 252:17 0 10G 0 lvm │ ├─vg_beta_arbiter_3-pool_beta_arbiter_3 252:19 0 10G 0 lvm │ └─vg_beta_arbiter_3-lv_beta_arbiter_3 252:23 0 10G 0 lvm /mnt/brick_beta_arbiter_3 └─vg_beta_arbiter_3-pool_beta_arbiter_3_tdata 252:9 0 10G 0 lvm └─vg_beta_arbiter_3-pool_beta_arbiter_3-tpool 252:17 0 10G 0 lvm ├─vg_beta_arbiter_3-pool_beta_arbiter_3 252:19 0 10G 0 lvm └─vg_beta_arbiter_3-lv_beta_arbiter_3 252:23 0 10G 0 lvm /mnt/brick_beta_arbiter_3 ```
Created attachment 1454683 [details] screenshot 3: with profiling disabled I also noticed that this chart shown data even when profiling is disabled, see screenshot 3.
(In reply to Martin Bukatovic from comment #9) > Created attachment 1454683 [details] > screenshot 3: with profiling disabled > > I also noticed that this chart shown data even when profiling is disabled, > see screenshot 3. Workload shown on the screenshot: extracting articles from enwiki-latest-pages-articles.xml.bz2 tarball into individual files for about 20 hours.
Providing QE ack in a hope that patches linked to this BZ fixes the problem. I will provide the `pstack {brick pid}` details during verification.
Now IOPS and disk data in grafana dashboards reflect at the same time from start. While there is a write happening on the bricks, the graphs reflect the same as expected.
Created attachment 1476180 [details] screenshot 4: verification Testing with ============ [root@mbukatov-usm1-server ~]# rpm -qa | grep tendrl | sort tendrl-ansible-1.6.3-6.el7rhgs.noarch tendrl-api-1.6.3-5.el7rhgs.noarch tendrl-api-httpd-1.6.3-5.el7rhgs.noarch tendrl-commons-1.6.3-12.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-10.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch tendrl-node-agent-1.6.3-10.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-10.el7rhgs.noarch [root@mbukatov-usm1-gl1 ~]# rpm -qa | grep tendrl | sort tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch tendrl-commons-1.6.3-12.el7rhgs.noarch tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch tendrl-node-agent-1.6.3-10.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch Results ======= When I perform the steps to reproduce, IOPS chart of Disk Load section of Brick dashboard now shows data immediately without any delay, which includes both: * zero or very small values (when no traffic from client is happening) * iops data matching other charts on the dashboard during actual workload Note: only single value (accounting both read and writes) is reported.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616