1593852 – IOPS chart on Disk Load of Brick Dashboard shows no data during brick read/write operation

Bug 1593852 - IOPS chart on Disk Load of Brick Dashboard shows no data during brick read/write operation

Summary: IOPS chart on Disk Load of Brick Dashboard shows no data during brick read/wr...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-monitoring-integration
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Shubhendu Tripathi
QA Contact:	Martin Bukatovic
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-06-21 16:49 UTC by Martin Bukatovic
Modified:	2018-09-04 07:08 UTC (History)
CC List:	7 users (show)
Fixed In Version:	tendrl-monitoring-integration-1.6.3-6.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 07:07:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
screenshot 1 (164.81 KB, image/png) 2018-06-21 16:52 UTC, Martin Bukatovic	no flags	Details
screenshot 2 (171.58 KB, image/png) 2018-06-21 16:53 UTC, Martin Bukatovic	no flags	Details
screenshot 3: with profiling disabled (201.69 KB, image/png) 2018-06-26 14:43 UTC, Martin Bukatovic	no flags	Details
screenshot 4: verification (143.53 KB, image/png) 2018-08-15 15:13 UTC, Martin Bukatovic	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	Tendrl monitoring-integration issues 504	None	None	None	2018-07-03 13:45:09 UTC
Github	Tendrl monitoring-integration issues 505	None	None	None	2018-07-03 13:45:27 UTC
Red Hat Bugzilla	1581736	unspecified	CLOSED	IOPS metric is not intuitive enough	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2018:2616	None	None	None	2018-09-04 07:08:46 UTC

Internal Links: 1581736

Description Martin Bukatovic 2018-06-21 16:49:31 UTC

Description of problem
======================

IOPS chart on Disk Load of Brick Dashboard shows no data when it should
(while copying data into particular brick to 100% utilization and then reading
it all, with profiling enabled).

Note that there is another IOPS chart on the Brick Dashborad, in At Glance
section, which seems to report data as expected.

Reported during retesting BZ 1581736.

Version-Release number of selected component
=============================================

tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch

[root@mbukatov-usm1-server ~]# rpm -qa | grep tendrl | sort
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-4.el7rhgs.noarch

[root@mbukatov-usm1-gl1 ~]#  rpm -qa | grep tendrl | sort
tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-5.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch

How reproducible
================

100%

Steps to Reproduce
==================

1. prepare gluster trusted storage pool with at least one volume
2. install  WA using tendrl-ansible
3. mount the volume on dedicated client machine
4. on the client, copy large tarball in the volume while observing
   IOPS chart on Disk Load of Brick Dashboard for affected brick
   (the brick which will store the data)
5. on the client, run md5sum on the tarball you just copied into the volume,
   while observing IOPS chart on Disk Load of Brick Dashboard for affected
   brick
6. wait about half an hour

Actual results
==============

I see no data reported on the IOPS chart on Disk Load of Brick Dashboard at
first. See screenshot 1, where you can see peak in write and then read in
various charts, but the IOPS chart in disk load section reports no data.

Then after the long wait (step 6), see screenshot 2, I can finally see there
some data, but it's unclear what it means, as:

 * there is no traffic on the brick during this time
 * values reported here are floats smaller than one (which doesn't really
   make much sense for IOPS at first sight)

Expected results
================

Not sure, either description of this chart should be clarified and based on
it's purpose, we may need to fix it if needed.

Comment 1 Martin Bukatovic 2018-06-21 16:50:29 UTC

Linking related BZ about IOPS.

Comment 2 Martin Bukatovic 2018-06-21 16:52:18 UTC

Created attachment 1453564 [details]
screenshot 1

Comment 3 Martin Bukatovic 2018-06-21 16:53:34 UTC

Created attachment 1453565 [details]
screenshot 2

Comment 4 Martin Bukatovic 2018-06-21 16:56:38 UTC

Additional info
===============

Inspecting timestamp from the screenshots, I can see that I waited just 16
in the step 6.

<wild-guess>
It's also possible that this is caused by some sync issue and bad time stamps
somewhere.
</wild-guess>

Comment 6 Martin Bukatovic 2018-06-21 17:01:51 UTC

Maybe important detail: I actually run out of the free space on the brick.

Comment 7 Martin Bukatovic 2018-06-21 17:17:57 UTC

Additional Info
===============

The IOPS chart in question has labels "vdd-read" and "vdd-write". So for reference, I checked
that this device really hosts the brick:

```
[root@mbukatov-usm1-gl1 ~]# lsblk /dev/vdd
NAME                                            MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdd                                             253:48   0  10G  0 disk 
├─vg_beta_arbiter_3-pool_beta_arbiter_3_tmeta   252:6    0  52M  0 lvm  
│ └─vg_beta_arbiter_3-pool_beta_arbiter_3-tpool 252:17   0  10G  0 lvm  
│   ├─vg_beta_arbiter_3-pool_beta_arbiter_3     252:19   0  10G  0 lvm  
│   └─vg_beta_arbiter_3-lv_beta_arbiter_3       252:23   0  10G  0 lvm  /mnt/brick_beta_arbiter_3
└─vg_beta_arbiter_3-pool_beta_arbiter_3_tdata   252:9    0  10G  0 lvm  
  └─vg_beta_arbiter_3-pool_beta_arbiter_3-tpool 252:17   0  10G  0 lvm  
    ├─vg_beta_arbiter_3-pool_beta_arbiter_3     252:19   0  10G  0 lvm  
    └─vg_beta_arbiter_3-lv_beta_arbiter_3       252:23   0  10G  0 lvm  /mnt/brick_beta_arbiter_3
```

Comment 9 Martin Bukatovic 2018-06-26 14:43:09 UTC

Created attachment 1454683 [details]
screenshot 3: with profiling disabled

I also noticed that this chart shown data even when profiling is disabled,
see screenshot 3.

Comment 10 Martin Bukatovic 2018-06-26 14:45:15 UTC

(In reply to Martin Bukatovic from comment #9)
> Created attachment 1454683 [details]
> screenshot 3: with profiling disabled
> 
> I also noticed that this chart shown data even when profiling is disabled,
> see screenshot 3.

Workload shown on the screenshot: extracting articles from
enwiki-latest-pages-articles.xml.bz2 tarball into individual files
for about 20 hours.

Comment 15 Martin Bukatovic 2018-07-04 17:41:33 UTC

Providing QE ack in a hope that patches linked to this BZ fixes the problem.
I will provide the `pstack {brick pid}` details during verification.

Comment 18 Shubhendu Tripathi 2018-07-09 12:37:59 UTC

Now IOPS and disk data in grafana dashboards reflect at the same time from start. While there is a write happening on the bricks, the graphs reflect the same as expected.

Comment 20 Martin Bukatovic 2018-08-15 15:13:26 UTC

Created attachment 1476180 [details]
screenshot 4: verification

Testing with
============

[root@mbukatov-usm1-server ~]# rpm -qa | grep tendrl | sort
tendrl-ansible-1.6.3-6.el7rhgs.noarch
tendrl-api-1.6.3-5.el7rhgs.noarch
tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
tendrl-commons-1.6.3-12.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-10.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch
tendrl-node-agent-1.6.3-10.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-10.el7rhgs.noarch

[root@mbukatov-usm1-gl1 ~]# rpm -qa | grep tendrl | sort
tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.6.3-12.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch
tendrl-node-agent-1.6.3-10.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch

Results
=======

When I perform the steps to reproduce, IOPS chart of Disk Load section of
Brick dashboard now shows data immediately without any delay, which includes
both:

 * zero or very small values (when no traffic from client is happening)
 * iops data matching other charts on the dashboard during actual workload

Note: only single value (accounting both read and writes) is reported.

Comment 22 errata-xmlrpc 2018-09-04 07:07:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.