Bug 2026144 - Utilization values are misreported and isn't matching
Summary: Utilization values are misreported and isn't matching
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.12.0
Assignee: arun kumar mohan
QA Contact: Tiffany Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 2027681
TreeView+ depends on / blocked
 
Reported: 2021-11-23 20:44 UTC by Aman Agrawal
Modified: 2023-08-09 17:00 UTC (History)
12 users (show)

Fixed In Version: 4.12.0-100
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2027681 (view as bug list)
Environment:
Last Closed: 2023-02-08 14:06:28 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 1433 0 None Merged Decreasing the timeout of 'ODF_standardized_metrics.rules' group to 30s 2022-09-15 11:17:54 UTC
Github red-hat-storage ocs-operator pull 1554 0 None Merged Bug 2026144: [release-4.10] Decreasing the timeout of 'ODF_standardized_metrics.rules' group to 30s 2022-09-15 11:17:55 UTC
Github red-hat-storage ocs-operator pull 1761 0 None open Fix utilization values mismatch queries 2022-09-15 11:18:00 UTC
Github red-hat-storage ocs-operator pull 1839 0 None open Bug 2026144: [release-4.12] Fix utilization values mismatch queries 2022-10-12 05:47:34 UTC
Github red-hat-storage ocs-operator pull 1881 0 None open Bug 2026144: [release-4.12] Sync all the libsonnet changes to metrics deploy files 2022-11-04 09:13:37 UTC
Github red-hat-storage odf-console pull 352 0 None open Fix Utilization card queries 2022-09-15 11:17:58 UTC

Comment 2 Nitin Goyal 2021-11-24 05:21:55 UTC
@badhikar Can you pls take a look?

Comment 3 Bipul Adhikari 2021-11-24 06:13:51 UTC
Utilization card reports the actual utilization of storage(non-replicated, actual storage should be 3 x value + staticUsage) . It's the non-replicated storage, whereas the main dashboard shows the exact raw capacity used of the system. 
One of the reasons we cannot show the logical capacity(non-replicated) in the main Overview(ODF) dashboard is that the total logical capacity is dependent on various factors such as number of pools and their replica counts which can change the total capacity of the system all the time. 
This is a technical limitation and I see no workarounds at this point.

Comment 7 Bipul Adhikari 2021-12-08 04:58:57 UTC
I have sent the findings of the bug on an email thread.

Comment 9 arun kumar mohan 2021-12-20 18:44:32 UTC
Send a PR: https://github.com/red-hat-storage/ocs-operator/pull/1433, in OCS-Operator.
This PR will reduce the collection interval of that particular rule group, so that the metrics collected will be more granular.

Comment 12 Nitin Goyal 2022-03-01 07:34:55 UTC
Moving it to ocs as it is fixed there.

Comment 16 Mudit Agarwal 2022-03-14 03:31:40 UTC
Not a 4.10 blocker, moving it out.

Arun, PTAL.

Comment 20 arun kumar mohan 2022-07-01 19:16:34 UTC
Pointing out some general observations,

ODF Overview page graphs (IOPS, Throughput and Latency) shows the sum of Read+Writes of I/O when compared with the separate Read and Write graphs in StorageSystem's -> Overview Block & File page. So both set of graphs won't have a one to one correlation.

For,
IOPS
      ODF Overview page and StorageSystem Overview page has different time intervals.

Throughput
      Underlying queries are different with different collection interval.

Latency
     Again two totally different queries (where one taking an average and the other taking sum).

For IOPS we can correct the time interval and make it same and for the other two need to check/investigate a bit further on the queries.

Comment 23 arun kumar mohan 2022-08-01 10:08:34 UTC
PR submitted to both ODF-Console and OCS-Operator repos

https://github.com/red-hat-storage/ocs-operator/pull/1761
https://github.com/red-hat-storage/odf-console/pull/352

Comment 24 Tiffany Nguyen 2022-11-21 21:32:23 UTC
Using version 4.12.0-114: IOPS, Throughput, Latency from Performance card and Used Capacity value from System Capacity are very closed to each other.


Note You need to log in before you can comment on or make changes to this bug.