Created attachment 1868151 [details] pods before uninstall Description of problem: ========================================================== Allow uninstall of Consumer add-on even if storagecluster not created/Ready state , i.e if onboarding failed for any reason. Currently, implementation of add-on uninstall is such that if storagecluster is absent/Error state, it wont proceed. See bug 2065032 requesting for a change in the current logic However for consumer cluster, this hard check is definitely a problem since there could be miltiple reasons because of which onboarding a consumer failed.. and hence we shouuld be able to uninstall the add-on and try again Version-Release number of selected component (if applicable): ================================================================== oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.23 True False 9h Cluster version is 4.9.23 ➜ internal-410 oc get csv -n openshift-storage -o json ocs-operator.v4.10.0 | jq '.metadata.labels["full_version"]' "4.10.0-203" How reproducible: ====================== Always Steps to Reproduce: ======================== 1. Create a provider and consumer cluster using the stpe from [1] 2. Install Add-on on consumer a) to reproduce the issue - either create a provider cluster with incorrect public key so that onboarding consumer fails - https://chat.google.com/room/AAAASHA9vWs/gtIYwCL0fn0 b) OR install add-on with incorrect details so that onboarding fails and no cephcluster is created (Storagecluster is in Error state.) 3) check consumer is not onboarded by checking logs or $ oc get storageconsumer -n openshift-storage (in provider) <No output> 4. Uninstall add-on in consumer when cluster is in bad shape [1]https://docs.google.com/document/d/1ehNBscWgLGNYqnnZUp6RPnkR9ByYU69BgXvr_z2n5sE/edit?hl=en&forcehl=1# Actual results: ====================== Uninstall fails to start and we have to do manual Workaround of deleting the namespace. However, even after that, the add-on stays in "Uninstalling" state for very long (until OCM marks it as uninstalled) Expected results: ===================== Uninstall should be allowed. Workaround ================ Since uninstall of add-on didnt proceed from UI ➜ oc delete namespace openshift-storage namespace "openshift-storage" deleted <had to patch finalisers of some resources to have successful deletion of namespace
This needs to be examined very carefully, mainly because we expect SRE to react to the error state in consumer installations. The preferred way would be to allow SRE to fix the issue then uninstall, we have a lot of experience with issues when we force uninstall ODF in cases where it is not in Ready state. In these cases (when things go wrong), we do not expect SRE to solve the issue and this will require more experienced intervention. My own opinion is that we should close this with WONTFIX
This is the expected behaviour as the chances of successful uninstallation of ODF are highest when storageCluster is ready, closing this bug as won't fix.