Bug 2068184

Summary: Allow uninstall of add-on in Consumer if onboarding fails
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Neha Berry <nberry>
Component: odf-managed-serviceAssignee: Ohad <omitrani>
Status: CLOSED WONTFIX QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: aeyal, dbindra, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-23 06:42:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pods before uninstall none

Description Neha Berry 2022-03-24 15:34:06 UTC
Created attachment 1868151 [details]
pods before uninstall

Description of problem:
==========================================================
Allow uninstall of Consumer add-on even if storagecluster not created/Ready state , i.e if onboarding failed for any reason. 

Currently, implementation of add-on uninstall is such that if storagecluster is absent/Error state, it wont proceed. See bug 2065032 requesting for a change in the current logic

However for consumer cluster, this hard check is definitely a problem since there could be miltiple reasons because of which onboarding a consumer failed.. and hence we shouuld be able to uninstall the add-on and try again


Version-Release number of selected component (if applicable):
==================================================================
oc get clusterversion                                                                   
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        False         9h      Cluster version is 4.9.23
➜  internal-410 oc get csv -n openshift-storage -o json ocs-operator.v4.10.0 | jq '.metadata.labels["full_version"]'
"4.10.0-203"



How reproducible:
======================
Always

Steps to Reproduce:
========================
1. Create a provider and consumer cluster using the stpe from [1]
2. Install Add-on on consumer 
a) to reproduce the issue - either create a provider cluster with incorrect public key so that onboarding consumer fails - https://chat.google.com/room/AAAASHA9vWs/gtIYwCL0fn0
b) OR install add-on with incorrect details so that onboarding fails and no cephcluster is created (Storagecluster is in Error state.)

3) check consumer is not onboarded by checking logs or
$ oc get storageconsumer -n openshift-storage (in provider)
<No output> 

4. Uninstall add-on in consumer when cluster is in bad shape

[1]https://docs.google.com/document/d/1ehNBscWgLGNYqnnZUp6RPnkR9ByYU69BgXvr_z2n5sE/edit?hl=en&forcehl=1#

Actual results:
======================
Uninstall fails to start and we have to do manual Workaround of deleting the namespace.
However, even after that, the add-on stays in "Uninstalling" state for very long (until OCM marks it as uninstalled)

Expected results:
=====================
Uninstall should be allowed.


Workaround
================

Since uninstall of add-on didnt proceed from UI 

➜  oc delete namespace openshift-storage
namespace "openshift-storage" deleted

<had to patch finalisers of some resources to have successful deletion of namespace

Comment 2 Ohad 2022-03-25 12:32:12 UTC
This needs to be examined very carefully, mainly because we expect SRE to react to the error state in consumer installations.
The preferred way would be to allow SRE to fix the issue then uninstall, we have a lot of experience with issues when we force uninstall ODF in cases where it is not in Ready state.

In these cases (when things go wrong), we do not expect SRE to solve the issue and this will require more experienced intervention. 

My own opinion is that we should close this with WONTFIX

Comment 9 Dhruv Bindra 2023-01-23 06:42:42 UTC
This is the expected behaviour as the chances of successful uninstallation of ODF are highest when storageCluster is ready, closing this bug as won't fix.