Bug 2272597

Summary: [ODF used capacity trend] average usage and days left predictions are incorrect
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Sanjal Katiyar <skatiyar>
Component: management-consoleAssignee: Divyansh Kamboj <dkamboj>
Status: CLOSED ERRATA QA Contact: Nagendra Reddy <nagreddy>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.16CC: asriram, dkamboj, ebenahar, hnallurv, nagreddy, nthomas, odf-bz-bot, sheggodu
Target Milestone: ---   
Target Release: ODF 4.17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.17.0-89 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-10-30 14:27:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanjal Katiyar 2024-04-02 06:18:50 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In order to calculate average usage and days left (until storage is full) we are relying on "ocs-metrics-exporter" "uptime": https://github.com/red-hat-storage/odf-console/blob/master/packages/ocs/queries/ceph-storage.ts#L338 which is incorrect ("uptime" of the exporter is not equal to the "retention period" of the Prometheus instance), it will affect the average if exporter is running for (say) a year (it is an ever increasing value).
Another issue is if exporter restarts for some reason, it gonna reset the "uptime" as well.

Version of all relevant components (if applicable):
ODF 4.16


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes, whenever "uptime" > "retention period" of Prometheus


Can this issue reproduce from the UI?
Yes


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:
Epic: https://issues.redhat.com/browse/RHSTOR-4624.

Maybe we can look into reading retention period directly from the OCP ConfigMap (https://docs.openshift.com/container-platform/4.13/monitoring/configuring-the-monitoring-stack.html#modifying-retention-time-and-size-for-prometheus-metrics-data_configuring-the-monitoring-stack).

Comment 9 Divyansh Kamboj 2024-04-29 14:40:11 UTC
the other 2 bzs that have been mentioned occur due to manual failover, you can go ahead and test it in a clean setup

Comment 16 Sunil Kumar Acharya 2024-08-30 15:19:19 UTC
Please evaluate and provide qa_ack.

Comment 20 Sunil Kumar Acharya 2024-09-03 05:33:26 UTC
Please update the RDT flag/text appropriately.

Comment 23 errata-xmlrpc 2024-10-30 14:27:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Comment 24 Red Hat Bugzilla 2025-02-28 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days