Bug 2005014
Summary: | state of ODF StorageSystem is misreported during installation or uninstallation | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Martin Bukatovic <mbukatov> | ||||
Component: | Console Storage Plugin | Assignee: | Bipul Adhikari <badhikar> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anna Sandler <asandler> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.9 | CC: | afrahman, amagrawa, amohan, aos-bugs, asandler, badhikar, jarrpa, jefbrown, madam, muagarwa, nibalach, niding, nthomas, ocs-bugs, rcyriac | ||||
Target Milestone: | --- | Keywords: | Regression | ||||
Target Release: | 4.10.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2022-03-10 16:11:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 2006760, 2017717 | ||||||
Attachments: |
|
Description
Martin Bukatovic
2021-09-16 14:42:22 UTC
Created attachment 1823589 [details]
screenshot #2: clear status of StorageCluster CR in OCS 4.8 during installation (for comparision)
There is no phase even in the backend in the storage system CR. We have conditions only and based on those conditions + storageCluster phase we can set the state in the UI. I don't know how it is done today. looping @afrahman who can help us to understand this better. Additional information ====================== When installation fails, and StorageCluster ends up in Error state, the StorageSystem and it's representation in the UI still reports "-" and it's not directly possible to understand that installation failed (you need to know that there is StorageCluster CR to check). @badhikar Can you pls change the component accordingly? I think the component is correct. Isn't metrics exporter part of odf-operator. @amohan could you please take a look? The same problem is applicable to uninstallation, based on discussion during triage meeting yesterday, I'm expanding the case cases of this bug to installation and uninstallation. As Nitin notes, StorageSystem CR doesn't not communicate status on purpose, and using metrics for this purpose instead (as currently implemented in UI) is wrong, because this will allow to communicate status only after ODF is successfully installed, and so can't provide status during installation, esp. when sth. goes wrong when such status it's most important to communicate. See also a negative use case mentioned in comment 7 (to reproduce, one can mislabel machines or miss any other necessary step before creation of storagecluster CR ...). The console relies on metrics being provided to it. If the backend is not exporting metrics then we will show a `-`. The metrics must be reported as per the agreed upon standardization scheme. Moving it back to ODF Operator as this is an issue with metrics exporter. (In reply to Bipul Adhikari from comment #10) > I think the component is correct. Isn't metrics exporter part of > odf-operator. @amohan could you please take a look? Yes, `metric-exporter` is a part of odf-operator. As Martin noted in comment 14, this is not something we should just blindly punt to odf-operator. It feels kind of ridiculous to expect operand state from a metric! Time spent in a state is a metric, but not the state itself, especially if the metric reporting lags the actual state change. So, sorry Bipul, but I'm throwing this back at you. Why on earth are we using metrics for this? The StorageSystem should already be accurately reporting it's state in the Status Conditions. My understanding is that the page in question is part of the console plugin, so we should still be able to fix this. Please don't punt this back until we actually figure this out. :P Console team has always been an advocate of not standardizing metrics as every storage system is unique and it can have it's own set of steps that are required for it to report information. We were in favor of each storage system vendor to push it's own logic(via extensions in the UI) to determine state or other information. However, this idea was rejected and we are working with standardized metrics. Hence the UI cannot accommodate such logic at this point. The metrics exporter will have to figure out a way to convey this message. So the reason we are using metrics exporter is because the dashboard and list page was designed with standardized metrics in mind. We cannot expect the dashboard to accommodate custom logic for each storage provider. I think the issue here is that using the metrics to determine status is incorrect. They are meant for a different purpose. This was brought up in the orchestration forum today. There's a rough plan, but first step is to talk to QE to get more details. I'll follow up with this offline and report back to this BZ with the results. *** Bug 2008143 has been marked as a duplicate of this bug. *** verifying by the attachment above when deleting it to try and install it again the status was observed too *** Bug 2019652 has been marked as a duplicate of this bug. *** Fixing this issue fixes https://bugzilla.redhat.com/show_bug.cgi?id=2019652 https://bugzilla.redhat.com/show_bug.cgi?id=2005014 The backport of this BZ to 4.9 should fix all the issues in 4.9 as well. Although from the description these bugs look unrelated. The source of the problem for the aforementioned bugs are solved with this fix hence marking duplicates. I'm not sure I understand how would this bug be a duplicate of bz 2019652. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |