Description of problem (please be detailed as possible and provide log snippets): The aggregate of usage per project is not equal to total usage shown in the dashboard in the capacity breakdown. (Attached Screenshots from the dashboard) This representation of total usage and usage per project in OCS dashboard is confusing and it should show them in a simpler manner. - The total usage show in the dashboard is calculated from `USED` space from ceph osd df output: For ex.: Below snippet of from OCS 4.3 cluster: The total usage from the dashboard is 35.85 Gib which is aggregate of usage of the three osd's : Dashboard data: ---- Total usage: 35.85 Gb out of 1.5 Tb 1.47 available Per project usage: openshift-logging: 6.30 Gb openshift-storage: 372.5 Mb ---- Total usage of three osd's from ceph osd df output is : 36 Gib (data) + 3GiB (meta) ---- sh-4.4# ceph df detail RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 1.5 TiB 1.5 TiB 36 GiB 39 GiB 2.55 TOTAL 1.5 TiB 1.5 TiB 36 GiB 39 GiB 2.55 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR ocs-storagecluster-cephblockpool 1 18 GiB 5.10k 36 GiB 2.51 700 GiB N/A N/A 5.10k 0 B 0 B ocs-storagecluster-cephfilesystem-metadata 2 4.1 KiB 22 384 KiB 0 700 GiB N/A N/A 22 0 B 0 B ocs-storagecluster-cephfilesystem-data0 3 0 B 0 0 B 0 467 GiB N/A N/A 0 0 B 0 B sh-4.4# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 2 ssd 0.49899 1.00000 511 GiB 19 GiB 18 GiB 52 KiB 1024 MiB 492 GiB 3.68 1.45 24 up 0 ssd 0.49899 1.00000 511 GiB 1.4 GiB 401 MiB 27 KiB 1024 MiB 510 GiB 0.27 0.11 0 down 1 ssd 0.49899 1.00000 511 GiB 19 GiB 18 GiB 31 KiB 1024 MiB 492 GiB 3.68 1.45 24 up TOTAL 1.5 TiB 39 GiB 36 GiB 112 KiB 3.0 GiB 1.5 TiB 2.55 --- - Usage per project shown in the dashboard is calculated from used space (df -h) of the storage provisioned from individual pods. For ex,: From dashboard, usage of openshift-logging project is 6.30 Gb, which is three times of usage by individual pods: 2.2 Gb * 3 = 6.6 Gb ---- # ocs rsh <elastic-search-pod> df -h|grep persistent /dev/rbd0 184G 2.2G 181G 2% /elasticsearch/persistent /dev/rbd0 184G 2.2G 181G 2% /elasticsearch/persistent /dev/rbd1 184G 2.2G 181G 2% /elasticsearch/persistent ---- Same represenattion of usage is observed in OCS 4.4 dashboard as well. Version of all relevant components (if applicable): OCS 4.4 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, this representation of usage is confusing to the end customer. Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes - 100% reproducible Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: - Steps to Reproduce: - Create projects and perform workload on it. I have deployed elastic search pods. - Check capacity breakdown from OCS dashboard. Actual results: Cumulation of usage per project is not equal to total usage as observed in the OCS dashboard. Expected results: There should be a simpler representation of usage in the dashboard.
Created attachment 1698823 [details] Screenshot of OCS dashboard
Created attachment 1698846 [details] Must-gather-ocs-4.3
Per https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c4 , to some extent this is handled(redesigned the representaion already). Issues mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c5 needs to be analysed further and requires time. cannot be handled in 4.5, movin gout to 4.6
As mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c9 following things were done to make things simpler. 1) Total Cpapacity from was removed from Persistent Storage dashboard as it did not make sense; as the total capacity is shared by MCG and RGW as well. Thus the representation does not makes sense. 2) Replication factor was removed to make things simpler for the user to understand the exact utilation and capacity left on the OCS for PVCs. Now we just show Used and Available capacity for PVCs on Persistent Storage Dashboard. Thus to some extent it should make monitoring OCS PVC usage much easier now. IMO the dashboard was designed for 4.2.0 and since then a lot of new things have been added as features in OCS, thus in a further release the dashboard might require a complete overhaul. Moving this to UX team as to get their views on this. If UX teams thinks what we have is sufficient then we can close this, otherwise this bug can serve as one of the basis for future work and UX change.
taking care of as part of bug https://bugzilla.redhat.com/show_bug.cgi?id=1866320
@Mudit yes we should move it to OCP. Moving it OCP.
I see that question from comment 14 is still now answered.
@Martin, I believe the details were described in https://bugzilla.redhat.com/show_bug.cgi?id=1866320#c4
Old total usage value has been removed from the dashboard. In the latest version Raw Used Capacity value corresponds to the RAW USED column shown by "ceph df detail" command. A tooltip on Used Capacity Breakdown card has been added to explain the difference between raw used capacity and the used capacity broken down by projects or other kubernetes resources. So I assume that the original issue is fixed. The issue described in https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c5 needs more time as stated by https://bugzilla.redhat.com/show_bug.cgi?id=1851203#c7, so I suggest tracking it in a separate BZ.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days