Bug 1549146 - Some huge numbers reported by grafana are hard to read and understand
Summary: Some huge numbers reported by grafana are hard to read and understand
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: Ankush Behl
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-02-26 14:01 UTC by Daniel Horák
Modified: 2018-09-04 07:02 UTC (History)
8 users (show)

Fixed In Version: tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch
Doc Type: Bug Fix
Doc Text:
Previously, the Grafana dashboard reported unusual and unrealistic values for different performance panels along with missing performance units. Few of the affected panels that displayed unrealistic numbers were Weeks Remaining, IOPS, throughput, etc. With this fix, the Grafana panels including the IOPS and Weeks Remaining panels display realistic values understandable by the user along with the appropriate performance units.Additionally, the Inode panels were removed from the brick-level and volume-level dashboards of Grafana.
Clone Of:
Environment:
Last Closed: 2018-09-04 07:00:53 UTC
Embargoed:


Attachments (Terms of Use)
Grafana dashboards (438.52 KB, application/x-gzip)
2018-02-26 14:01 UTC, Daniel Horák
no flags Details
IOPS Dashboard - units prefix B (41.30 KB, image/png)
2018-07-18 11:23 UTC, Daniel Horák
no flags Details
Grafana Dashboard - Weeks Remaining showing huge number (tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch) (9.58 KB, image/png)
2018-07-23 14:00 UTC, Daniel Horák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github Tendrl monitoring-integration issues 383 0 None None None 2018-03-27 14:20:09 UTC
Github Tendrl monitoring-integration pull 535 0 None None None 2018-08-13 03:31:46 UTC
Github https://github.com/Tendrl monitoring-integration issues 201 0 None None None 2018-03-29 12:40:38 UTC
Red Hat Bugzilla 1581718 0 unspecified CLOSED Weekly growth rate and week remaining metrics are not accurate 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1587804 1 None None None 2021-11-05 16:58:06 UTC
Red Hat Product Errata RHSA-2018:2616 0 None None None 2018-09-04 07:02:03 UTC

Internal Links: 1581718 1587804

Description Daniel Horák 2018-02-26 14:01:39 UTC
Created attachment 1400847 [details]
Grafana dashboards

Description of problem:
  Some of the numbers displayed on Grafana dashboards could be very high
  and it is quite difficult, to read and understand the magnitude of the
  particular number. See attached screenshots, for example:
    - IOPS
    - Weeks Remaining
    - Throughput

Version-Release number of selected component (if applicable):
  grafana-4.3.2-3.el7rhgs.x86_64
  tendrl-ansible-1.5.4-7.el7rhgs.noarch
  tendrl-api-1.5.4-4.el7rhgs.noarch
  tendrl-api-httpd-1.5.4-4.el7rhgs.noarch
  tendrl-commons-1.5.4-9.el7rhgs.noarch
  tendrl-grafana-plugins-1.5.4-14.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch
  tendrl-node-agent-1.5.4-16.el7rhgs.noarch
  tendrl-notifier-1.5.4-6.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.5.4-6.el7rhgs.noarch

How reproducible:
  100%

Steps to Reproduce:
1. Install Gluster cluster with (ideally some real world scenario).
2. Install Web Administration (Tendrl) and import the Gluster cluster.
3. Utilize the Gluster cluster (make some load, read/write operations..)
4. Let it run for few days, so there will be some meaningful data.
5. Check Grafana Dashboards.

Actual results:
  Some of the numbers displayed by Grafana are very high and it's difficult
  to understand and imagine the magnitude of the number.
  For example "Weeks Remaining": 1629077.

Expected results:
  The numbers displayed by Grafana dashboards will be nicely human readable.

Additional info:
  There are multiple ways how to accomplish this and it will probably depend
  on the type of the displayed value.
  
  For example:
  "Weeks Remaining" probably doesn't make much sense to show
  exact numbers for more than few years (and it might make sense to recalculate
  the number of weeks to number of months or years for longer periods).

  "Inode available" as this number doesn't have any unit (to prepend it with
  k,M,G...) it might make sense to show the number in form like:
  3124138767 ~ 3.1*10^9
  or something like that.

Comment 3 Ju Lim 2018-03-20 17:55:52 UTC
Related bug in GitHub: https://github.com/Tendrl/monitoring-integration/issues/383

Comment 4 Martin Bukatovic 2018-03-29 13:44:19 UTC
I'm providing qe_ack with following assumptions:

* devel team will provide a module to generate mock data for tendrl ui component
  so that integration testing of units of measure feature is possible
  (this mitigates the risk of qe team not being able to simulate all scenarios
  during system testing, especially for IOPS, throughput and weeks remaining)
* qe team will verify selected edge cases during system testing to make sure
  that values reported are matching reality (eg. IOPS, weeks remaining)

This has been agreed on during "RHGS WA Team status meeting" on 2018-03-29.

Comment 7 Martin Bukatovic 2018-05-02 08:53:34 UTC
Asking about requires_doc_text flag, as this changes how numbers are reported in
the web interface.

Comment 8 Martin Bukatovic 2018-05-25 16:21:00 UTC
Moving back to assigned because the dev team haven't provided mock module as
agreed in comment 4.

Provide the module along with description how it was used during dev testing,
and move this BZ into ON QA again.

Comment 9 Nishanth Thomas 2018-05-28 10:21:38 UTC
@ankush, Do we have means to provide mock module for verification?
@Daniel, BTW how Daniel did this when the bug found?

Comment 10 Ankush Behl 2018-05-28 10:26:33 UTC
@nthomas We don't have any script right now for pushing the mock data.
If we want something like this then we need to put an effort in writing the script.

AND Yes its better if we have the script for pushing the data to graphite that will help QE and Dev both in testing.

Comment 11 Daniel Horák 2018-05-28 11:48:19 UTC
If I remember it correctly, the bug was filed against cluster with 24 or 36
Storage Nodes.

We are able, to test the one (same/similar) scenario for which it was reported,
but we are not easily able to retest multiple different scenarios and edge cases.
All our resources are very limited, so for example we are able to simulate
big disks, but we are not able to fill them, as we haven't physically enough
space.

That was the reason, why Martin asked for a module to generate mock data.

Comment 12 Ankush Behl 2018-06-04 17:34:43 UTC
@dahorak @mbukatov I have written a script that can help QE generate and push mock data in carbon/graphite.

Steps to use a script:
1. Clone the Repo carbon-random[1].
2. open carbon_random.py
3. Provide IP address of the graphite/carbon
4. Put the name metrics you want to search for(In this bug we want to push data to IOPS metric)
5. Select the Lower and Upper bound values for the random integer data you want to push.
6. Select at which interval you want to push data to graphite.
7. Save and run "python carbon_random.py"

[1]. https://github.com/cloudbehl/carbon-random

Comment 13 Daniel Horák 2018-07-18 11:23:54 UTC
Created attachment 1459687 [details]
IOPS Dashboard - units prefix B

Comment 14 Daniel Horák 2018-07-18 11:39:52 UTC
@Ju, now the IOPS values are displayed with following suffix for the number:
    iops
  K iops
  M iops
  B iops
  T iops

What does means the "B" suffix? (see also attachment 1459687 [details])
If it is based on SI prefixes[1], shouldn't it be "G" instead of "B"?
And also shouldn't be "k" lowercase?

[1] https://en.wikipedia.org/wiki/Metric_prefix

Comment 16 Daniel Horák 2018-07-23 14:00:24 UTC
Created attachment 1469965 [details]
Grafana Dashboard - Weeks Remaining showing huge number (tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch)

Comment 25 Daniel Horák 2018-08-21 14:15:01 UTC
Tested and Verified on:
  Red Hat Enterprise Linux Server release 7.5 (Maipo)
  carbon-selinux-1.5.4-2.el7rhgs.noarch
  collectd-5.7.2-3.1.el7rhgs.x86_64
  collectd-ping-5.7.2-3.1.el7rhgs.x86_64
  grafana-4.3.2-3.el7rhgs.x86_64
  libcollectdclient-5.7.2-3.1.el7rhgs.x86_64
  python-carbon-0.9.15-2.1.el7rhgs.noarch
  tendrl-ansible-1.6.3-7.el7rhgs.noarch
  tendrl-api-1.6.3-5.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-5.el7rhgs.noarch
  tendrl-commons-1.6.3-12.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-10.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-10.el7rhgs.noarch
  tendrl-node-agent-1.6.3-10.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-11.el7rhgs.noarch

* All IOPS panels shows values with proper suffix (as described in Comment 14)
* Weeks Remaining panels shows numbers between 1-520[1],
  for numbers lower than 0, it shows: Growth rate is negative
  and for numbers higher than 520 it shows:
    Insufficient data collected for forecast
* Inode Available panels were removed

[1] Behaviour of Weeks Remaining panel is more discussed and controlled in
    Bug 1581718.
    Also there is one small issue described in Bug 1581718 comment 52,
    which will be processed as part of that bug.

>> VERIFIED

Comment 27 Ankush Behl 2018-09-03 16:10:02 UTC
Rakesh,

Overall looks good but this bug also targets the removal of INode panel removal from brick and volume dashboards. 

So I think we should also include some text about the Inode panel removal.

Comment 30 errata-xmlrpc 2018-09-04 07:00:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616


Note You need to log in before you can comment on or make changes to this bug.