Description of problem ====================== During installation of ODF StorageSystem via OCP Console web UI, status of the storage system is reported as sheer "-" instead of clear "progressing" or "installing". This is a regression compared to StorageCluster CRD behaviour of OCS 4.8 Version-Release number of selected component ============================================ OCP 4.9.0-0.nightly-2021-09-14-200602 LSO 4.9.0-202109132154 ODF 4.9.0-139.ci How reproducible ================ 2/2 Steps to Reproduce ================== 1. Install OCP cluster. 2. Install OCS/ODF (OpenShift Data Foundation) operator. 3. Install LSO operator. 4. Start "Create a StorageSystem" wizard in OCP Console web UI and complete the process. 5. Observe status of the new storage system in OCP Console Actual results ============== While the cluster is being installed, the status of new storage system is reported as "-", and only when the installation finishes, the status changes to "Ready". See screenshot #1. Expected results ================ The status is reported in a clear way as "progressing" (this is the behaviour in <= OCS 4.9) or "installing". This way, it's immediately clear that the installation is still going on without any problem, and it's possible to distinguish this state from possible problem with fetching status or other possible problems.
Created attachment 1823589 [details] screenshot #2: clear status of StorageCluster CR in OCS 4.8 during installation (for comparision)
There is no phase even in the backend in the storage system CR. We have conditions only and based on those conditions + storageCluster phase we can set the state in the UI. I don't know how it is done today. looping @afrahman who can help us to understand this better.
Additional information ====================== When installation fails, and StorageCluster ends up in Error state, the StorageSystem and it's representation in the UI still reports "-" and it's not directly possible to understand that installation failed (you need to know that there is StorageCluster CR to check).
@badhikar Can you pls change the component accordingly?
I think the component is correct. Isn't metrics exporter part of odf-operator. @amohan could you please take a look?
The same problem is applicable to uninstallation, based on discussion during triage meeting yesterday, I'm expanding the case cases of this bug to installation and uninstallation. As Nitin notes, StorageSystem CR doesn't not communicate status on purpose, and using metrics for this purpose instead (as currently implemented in UI) is wrong, because this will allow to communicate status only after ODF is successfully installed, and so can't provide status during installation, esp. when sth. goes wrong when such status it's most important to communicate. See also a negative use case mentioned in comment 7 (to reproduce, one can mislabel machines or miss any other necessary step before creation of storagecluster CR ...).
The console relies on metrics being provided to it. If the backend is not exporting metrics then we will show a `-`. The metrics must be reported as per the agreed upon standardization scheme. Moving it back to ODF Operator as this is an issue with metrics exporter.
(In reply to Bipul Adhikari from comment #10) > I think the component is correct. Isn't metrics exporter part of > odf-operator. @amohan could you please take a look? Yes, `metric-exporter` is a part of odf-operator.
As Martin noted in comment 14, this is not something we should just blindly punt to odf-operator. It feels kind of ridiculous to expect operand state from a metric! Time spent in a state is a metric, but not the state itself, especially if the metric reporting lags the actual state change. So, sorry Bipul, but I'm throwing this back at you. Why on earth are we using metrics for this? The StorageSystem should already be accurately reporting it's state in the Status Conditions. My understanding is that the page in question is part of the console plugin, so we should still be able to fix this. Please don't punt this back until we actually figure this out. :P
Console team has always been an advocate of not standardizing metrics as every storage system is unique and it can have it's own set of steps that are required for it to report information. We were in favor of each storage system vendor to push it's own logic(via extensions in the UI) to determine state or other information. However, this idea was rejected and we are working with standardized metrics. Hence the UI cannot accommodate such logic at this point. The metrics exporter will have to figure out a way to convey this message. So the reason we are using metrics exporter is because the dashboard and list page was designed with standardized metrics in mind. We cannot expect the dashboard to accommodate custom logic for each storage provider.
I think the issue here is that using the metrics to determine status is incorrect. They are meant for a different purpose.
This was brought up in the orchestration forum today. There's a rough plan, but first step is to talk to QE to get more details. I'll follow up with this offline and report back to this BZ with the results.
*** Bug 2008143 has been marked as a duplicate of this bug. ***
verifying by the attachment above
when deleting it to try and install it again the status was observed too
*** Bug 2019652 has been marked as a duplicate of this bug. ***
Fixing this issue fixes https://bugzilla.redhat.com/show_bug.cgi?id=2019652 https://bugzilla.redhat.com/show_bug.cgi?id=2005014 The backport of this BZ to 4.9 should fix all the issues in 4.9 as well. Although from the description these bugs look unrelated. The source of the problem for the aforementioned bugs are solved with this fix hence marking duplicates.
I'm not sure I understand how would this bug be a duplicate of bz 2019652.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056