Bug 2068184 - Allow uninstall of add-on in Consumer if onboarding fails
Summary: Allow uninstall of add-on in Consumer if onboarding fails
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ohad
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-24 15:34 UTC by Neha Berry
Modified: 2023-08-09 17:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-23 06:42:42 UTC
Embargoed:


Attachments (Terms of Use)
pods before uninstall (4.88 KB, text/plain)
2022-03-24 15:34 UTC, Neha Berry
no flags Details

Description Neha Berry 2022-03-24 15:34:06 UTC
Created attachment 1868151 [details]
pods before uninstall

Description of problem:
==========================================================
Allow uninstall of Consumer add-on even if storagecluster not created/Ready state , i.e if onboarding failed for any reason. 

Currently, implementation of add-on uninstall is such that if storagecluster is absent/Error state, it wont proceed. See bug 2065032 requesting for a change in the current logic

However for consumer cluster, this hard check is definitely a problem since there could be miltiple reasons because of which onboarding a consumer failed.. and hence we shouuld be able to uninstall the add-on and try again


Version-Release number of selected component (if applicable):
==================================================================
oc get clusterversion                                                                   
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.23    True        False         9h      Cluster version is 4.9.23
➜  internal-410 oc get csv -n openshift-storage -o json ocs-operator.v4.10.0 | jq '.metadata.labels["full_version"]'
"4.10.0-203"



How reproducible:
======================
Always

Steps to Reproduce:
========================
1. Create a provider and consumer cluster using the stpe from [1]
2. Install Add-on on consumer 
a) to reproduce the issue - either create a provider cluster with incorrect public key so that onboarding consumer fails - https://chat.google.com/room/AAAASHA9vWs/gtIYwCL0fn0
b) OR install add-on with incorrect details so that onboarding fails and no cephcluster is created (Storagecluster is in Error state.)

3) check consumer is not onboarded by checking logs or
$ oc get storageconsumer -n openshift-storage (in provider)
<No output> 

4. Uninstall add-on in consumer when cluster is in bad shape

[1]https://docs.google.com/document/d/1ehNBscWgLGNYqnnZUp6RPnkR9ByYU69BgXvr_z2n5sE/edit?hl=en&forcehl=1#

Actual results:
======================
Uninstall fails to start and we have to do manual Workaround of deleting the namespace.
However, even after that, the add-on stays in "Uninstalling" state for very long (until OCM marks it as uninstalled)

Expected results:
=====================
Uninstall should be allowed.


Workaround
================

Since uninstall of add-on didnt proceed from UI 

➜  oc delete namespace openshift-storage
namespace "openshift-storage" deleted

<had to patch finalisers of some resources to have successful deletion of namespace

Comment 2 Ohad 2022-03-25 12:32:12 UTC
This needs to be examined very carefully, mainly because we expect SRE to react to the error state in consumer installations.
The preferred way would be to allow SRE to fix the issue then uninstall, we have a lot of experience with issues when we force uninstall ODF in cases where it is not in Ready state.

In these cases (when things go wrong), we do not expect SRE to solve the issue and this will require more experienced intervention. 

My own opinion is that we should close this with WONTFIX

Comment 9 Dhruv Bindra 2023-01-23 06:42:42 UTC
This is the expected behaviour as the chances of successful uninstallation of ODF are highest when storageCluster is ready, closing this bug as won't fix.


Note You need to log in before you can comment on or make changes to this bug.