Bug 1851203 - [GSS] [RFE] Need a simpler representation of capactiy breakdown in total usage and per project breakdown in OCS 4 dashboard
Summary: [GSS] [RFE] Need a simpler representation of capactiy breakdown in total usag...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Console Storage Plugin
Version: 4.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.7.0
Assignee: Yuval
QA Contact: Elena Bondarenko
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-25 18:40 UTC by Sonal
Modified: 2023-10-06 20:51 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:13:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Screenshot of OCS dashboard (257.99 KB, image/png)
2020-06-25 18:42 UTC, Sonal
no flags Details
Must-gather-ocs-4.3 (8.49 MB, application/gzip)
2020-06-25 19:33 UTC, Sonal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 7404 0 None closed Bug 1866320: Changes to Capacity Metrics in OCS Persistent Storage Dashboard 2021-02-18 15:35:35 UTC
Red Hat Bugzilla 1787392 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 1847581 0 urgent CLOSED Inconsistency in breakdown and utilization card with respect to shown capacities 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:13:12 UTC

Description Sonal 2020-06-25 18:40:11 UTC
Description of problem (please be detailed as possible and provide log
snippets):

The aggregate of usage per project is not equal to total usage shown in the dashboard in the capacity breakdown. (Attached Screenshots from the dashboard)

This representation of total usage and usage per project in OCS dashboard is confusing and it should show them in a simpler manner.

- The total usage show in the dashboard is calculated from `USED` space from ceph osd df output:

For ex.: Below snippet of from OCS 4.3 cluster:

The total usage from the dashboard is 35.85 Gib which is aggregate of usage of the three osd's :

Dashboard data:
----
Total usage:
35.85 Gb out of 1.5 Tb    
1.47 available

Per project usage: 
openshift-logging: 6.30 Gb
openshift-storage: 372.5 Mb
----

Total usage of three osd's from ceph osd df output is : 36 Gib (data) + 3GiB (meta)
----
sh-4.4# ceph df detail
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED       RAW USED     %RAW USED 
    ssd       1.5 TiB     1.5 TiB     36 GiB       39 GiB          2.55 
    TOTAL     1.5 TiB     1.5 TiB     36 GiB       39 GiB          2.55 
 
POOLS:
    POOL                                           ID     STORED      OBJECTS     USED        %USED     MAX AVAIL     QUOTA OBJECTS     QUOTA BYTES     DIRTY     USED COMPR     UNDER COMPR 
    ocs-storagecluster-cephblockpool                1      18 GiB       5.10k      36 GiB      2.51       700 GiB     N/A               N/A             5.10k            0 B             0 B 
    ocs-storagecluster-cephfilesystem-metadata      2     4.1 KiB          22     384 KiB         0       700 GiB     N/A               N/A                22            0 B             0 B 
    ocs-storagecluster-cephfilesystem-data0         3         0 B           0         0 B         0       467 GiB     N/A               N/A                 0            0 B             0 B 


sh-4.4# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE VAR  PGS STATUS 
 2   ssd 0.49899  1.00000 511 GiB  19 GiB  18 GiB  52 KiB 1024 MiB 492 GiB 3.68 1.45  24     up 
 0   ssd 0.49899  1.00000 511 GiB 1.4 GiB 401 MiB  27 KiB 1024 MiB 510 GiB 0.27 0.11   0   down 
 1   ssd 0.49899  1.00000 511 GiB  19 GiB  18 GiB  31 KiB 1024 MiB 492 GiB 3.68 1.45  24     up 
                    TOTAL 1.5 TiB  39 GiB  36 GiB 112 KiB  3.0 GiB 1.5 TiB 2.55        

---


- Usage per project shown in the dashboard is calculated from used space (df -h) of the storage provisioned from individual pods.

For ex,: From dashboard, usage of openshift-logging project is 6.30 Gb, which is three times of usage by individual pods: 2.2 Gb * 3 = 6.6 Gb
----
# ocs rsh <elastic-search-pod>
df -h|grep persistent
/dev/rbd0                             184G  2.2G  181G   2% /elasticsearch/persistent
/dev/rbd0                             184G  2.2G  181G   2% /elasticsearch/persistent
/dev/rbd1                             184G  2.2G  181G   2% /elasticsearch/persistent
----

Same represenattion of usage is observed in OCS 4.4 dashboard as well.

Version of all relevant components (if applicable):

OCS 4.4

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, this representation of usage is confusing to the end customer.

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
Yes - 100% reproducible

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
- 

Steps to Reproduce:
- Create projects and perform workload on it. I have deployed elastic search pods.
- Check capacity breakdown from OCS dashboard.

Actual results:
Cumulation of usage per project is not equal to total usage as observed in the OCS dashboard.

Expected results:
There should be a simpler representation of usage in the dashboard.

Comment 2 Sonal 2020-06-25 18:42:29 UTC
Created attachment 1698823 [details]
Screenshot of OCS dashboard

Comment 3 Sonal 2020-06-25 19:33:42 UTC
Created attachment 1698846 [details]
Must-gather-ocs-4.3

Comment 7 Nishanth Thomas 2020-07-01 07:16:43 UTC
Per https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c4 , to some extent this is handled(redesigned the representaion already). Issues mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c5 needs to be analysed further and requires time. cannot be handled in 4.5, movin gout to 4.6

Comment 10 Anmol Sachan 2020-08-23 16:58:02 UTC
As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c9 following things were done to make things simpler. 
1) Total Cpapacity from was removed from Persistent Storage dashboard as it did not make sense; as the total capacity is shared by MCG and RGW as well. Thus the representation does not makes sense.
2) Replication factor was removed to make things simpler for the user to understand the exact utilation and capacity left on the OCS for PVCs. Now we just show Used and Available capacity for PVCs on Persistent Storage Dashboard.

Thus to some extent it should make monitoring OCS PVC usage much easier now.

IMO the dashboard was designed for 4.2.0 and since then a lot of new things have been added as features in OCS, thus in a further release the dashboard might require a complete overhaul. Moving this to UX team as to get their views on this. If UX teams thinks what we have is sufficient then we can close this, otherwise this bug can serve as one of the basis for future work and UX change.

Comment 11 Yuval 2020-11-19 14:28:25 UTC
taking care of as part of bug https://bugzilla.redhat.com/show_bug.cgi?id=1866320

Comment 15 Bipul Adhikari 2021-01-19 06:34:33 UTC
@Mudit yes we should move it to OCP. Moving it OCP.

Comment 16 Martin Bukatovic 2021-02-04 23:37:57 UTC
I see that question from comment 14 is still now answered.

Comment 17 Elena Bondarenko 2021-02-05 15:13:47 UTC
@Martin, I believe the details were described in https://bugzilla.redhat.com/show_bug.cgi?id=1866320#c4

Comment 18 Elena Bondarenko 2021-02-05 22:12:36 UTC
Old total usage value has been removed from the dashboard. In the latest version Raw Used Capacity value corresponds to the RAW USED column shown by "ceph df detail" command. A tooltip on Used Capacity Breakdown card has been added to explain the difference between raw used capacity and the used capacity broken down by projects or other kubernetes resources. So I assume that the original issue is fixed. The issue described in https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c5 needs more time as stated by https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c7, so I suggest tracking it in a separate BZ.

Comment 21 errata-xmlrpc 2021-02-24 15:13:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 22 Red Hat Bugzilla 2023-09-15 00:33:14 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.