Bug 2012120 - Console deletes operands regardless of interdependency
Summary: Console deletes operands regardless of interdependency
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Jakub Hadvig
QA Contact: Yadan Pei
URL:
Whiteboard:
: 2011897 2042268 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-08 10:34 UTC by Simone Tiraboschi
Modified: 2022-11-14 12:15 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-14 12:15:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Dialog (185.41 KB, image/png)
2021-10-08 10:35 UTC, Simone Tiraboschi
no flags Details
Errors (196.31 KB, image/png)
2021-10-08 10:35 UTC, Simone Tiraboschi
no flags Details

Description Simone Tiraboschi 2021-10-08 10:34:30 UTC
Description of problem:
On OCP 4.9 UI there is a new optional flag (see the attached uninstall dialog) to delete operands as a side effect of uninstalling an operator.

The feature got introduced with:
https://github.com/openshift/console/pull/9142
and it's completely implemented on console component.

If the user check `delete all operands` checkbox then the console is trying to   directly delete all operands.

The console doesn't implement any dry run check before attempting the real deletion.

I don't know if it's really a matter of ordering of the delete requests (the CSV and its direct dependencies only after the operands), or simply having the console waiting for the first object to get really disappeared (and not simply flagged for deletion) before sending the next request.
(see https://github.com/kubernetes/kubectl/blob/master/pkg/cmd/delete/delete.go#L366-L388 to compare with what `kubectl delete` is internally doing by default).

The results is that operands:
- protected by a validating webhook that denies the delete request according to operator specific logics
- or protected by a finalizer that should be removed by an operator after a set of steps,

are going to be leftovers on the cluster while operators and validating webhooks are instead removed as a side effects of the removal of the CSV.

Those leftovers requires complex manual recovery flows to be effectively removed .

See the second screenshot for an example when trying to remove CNV with running VMs.

Version-Release number of selected component (if applicable):
OCP 4.9.0

How reproducible:
100%

Steps to Reproduce:
1. Deploy CNV on OCP 4.9
2. Start a VM
3. Try to uninstall CNV from the UI with `delete all operands` options.

Actual results:
The CSV and so operators and webhooks deployed by the OLM disappears,
other operands protected by a deny from a validating webhook or by a finbalizer managed by an operator remains on the cluster as hard to be removed leftovers.
We also see bad errors from webhook failing because a rolebinding got deleted (as a side effect of deleting the CSV) before the webhook can act.


Expected results:
1. The removal of operands is safe:
- all or nothing logic with a pre fly check in dry run mode
- deletion of the CSV only at the end when we are 100% sure that all the operands really disappeared


2. An opt-out mechanism for specific operators with complex removal logic

Additional info:
I got it on CNV but I suspect it cann affects many products.

Comment 2 Simone Tiraboschi 2021-10-08 10:35:43 UTC
Created attachment 1830777 [details]
Errors

Comment 3 Simone Tiraboschi 2021-10-08 10:38:50 UTC
*** Bug 2011897 has been marked as a duplicate of this bug. ***

Comment 4 David Taylor 2021-10-11 16:39:45 UTC
Currently, console executes:

k8sKill(subscription)
k8sKill(csv)
k8sKill(operand,  ...for each operand

asynchronously, we need to await for all subscriptions and operands to be removed first, before attempting to remove the csv, ie:

await k8sKill(subscription);  <-- should happen as the first step to ensure that OLM doesn't initiate an upgrade during this process
await k8sKill(operands);
await k8sKill(csv);

There is a UX question about how to handle if operand(s) can't be deleted?  Do we do a dry-run and show 'not deletable' for those operands?  Do we give users an 'opt out' if errors discovered?

Comment 5 Fabian Deutsch 2021-10-12 12:36:23 UTC
rhbz #2012971 seems to be a follow up to this bug

Comment 6 David Taylor 2021-12-13 18:51:31 UTC
Hi amobrem, amobrem, kdoberst,
Wondering if the BZ is still considered 'severity=High' meaning it should be treated as a high SLO canidate?
In light of new HAC/OCM priorities and the additional work needed to complete this bug (the ability for operator to opt. out of deleting operands); should this be a 'severity=medium' or perhaps an RFE bug?

Comment 7 Dan Kenigsberg 2021-12-14 15:14:43 UTC
Please keep the severity=high. As an owner of a complex operator, I would like to opt out of this feature - Console (or anyone) should not delete anything in openshift-cnv but the top-level CR. I would like to annotate my CSV so that the option to delete parts of the resources is not exposed in UI.

Please share with us the opt-out annotation as soon as you can, because we have only one sprint for our own API freeze.

Comment 9 Joe Lanford 2022-04-28 14:20:08 UTC
I put this comment in the Console PR related to this bug, but I wanted to capture it here as well. This is the context for what the OLM team thinks is missing (allowing operator teams to opt-out of operand deletion):

> The OLM team decided to stick with the original decision of using an annotation in the CSV to allow operator versions to opt-out of operand deletion. The primary reason for this decision was to make it as easy as possible for operator authors to set the opt-out. Using only operator conditions would force operator authors to make code changes. This decision does not preclude further discussion around allowing dynamic opt-out via operator conditions. The annotation and operator condition mechanisms are not mutually exclusive.
> 
> The chosen annotation name is "console.openshift.io/disable-operand-delete"

The other important context for this bug is that operator teams will need to have time after this bug merges but before a GA release of OpenShift to be able to test the operand deletion and opt-out functionality. If this doesn't land before the OpenShift feature freeze date, it will likely need to be punted to the following release.

Comment 10 Joe Caiani 2022-05-04 15:57:59 UTC
@jhadvig since Dave is off on ocm land, can we reassign this to an ocp team member and assess whether we can get this into 4.11?

Comment 12 Simone Tiraboschi 2022-06-06 05:58:58 UTC
*** Bug 2042268 has been marked as a duplicate of this bug. ***

Comment 14 Jakub Hadvig 2022-11-14 12:15:19 UTC
This issue is being tracked by https://issues.redhat.com/browse/CONSOLE-3237
PR for fixing is up - https://github.com/openshift/console/pull/12234

Closing...


Note You need to log in before you can comment on or make changes to this bug.