Description of problem: On OCP 4.9 UI there is a new optional flag (see the attached uninstall dialog) to delete operands as a side effect of uninstalling an operator. The feature got introduced with: https://github.com/openshift/console/pull/9142 and it's completely implemented on console component. If the user check `delete all operands` checkbox then the console is trying to directly delete all operands. The console doesn't implement any dry run check before attempting the real deletion. I don't know if it's really a matter of ordering of the delete requests (the CSV and its direct dependencies only after the operands), or simply having the console waiting for the first object to get really disappeared (and not simply flagged for deletion) before sending the next request. (see https://github.com/kubernetes/kubectl/blob/master/pkg/cmd/delete/delete.go#L366-L388 to compare with what `kubectl delete` is internally doing by default). The results is that operands: - protected by a validating webhook that denies the delete request according to operator specific logics - or protected by a finalizer that should be removed by an operator after a set of steps, are going to be leftovers on the cluster while operators and validating webhooks are instead removed as a side effects of the removal of the CSV. Those leftovers requires complex manual recovery flows to be effectively removed . See the second screenshot for an example when trying to remove CNV with running VMs. Version-Release number of selected component (if applicable): OCP 4.9.0 How reproducible: 100% Steps to Reproduce: 1. Deploy CNV on OCP 4.9 2. Start a VM 3. Try to uninstall CNV from the UI with `delete all operands` options. Actual results: The CSV and so operators and webhooks deployed by the OLM disappears, other operands protected by a deny from a validating webhook or by a finbalizer managed by an operator remains on the cluster as hard to be removed leftovers. We also see bad errors from webhook failing because a rolebinding got deleted (as a side effect of deleting the CSV) before the webhook can act. Expected results: 1. The removal of operands is safe: - all or nothing logic with a pre fly check in dry run mode - deletion of the CSV only at the end when we are 100% sure that all the operands really disappeared 2. An opt-out mechanism for specific operators with complex removal logic Additional info: I got it on CNV but I suspect it cann affects many products.
Created attachment 1830777 [details] Errors
*** Bug 2011897 has been marked as a duplicate of this bug. ***
Currently, console executes: k8sKill(subscription) k8sKill(csv) k8sKill(operand, ...for each operand asynchronously, we need to await for all subscriptions and operands to be removed first, before attempting to remove the csv, ie: await k8sKill(subscription); <-- should happen as the first step to ensure that OLM doesn't initiate an upgrade during this process await k8sKill(operands); await k8sKill(csv); There is a UX question about how to handle if operand(s) can't be deleted? Do we do a dry-run and show 'not deletable' for those operands? Do we give users an 'opt out' if errors discovered?
rhbz #2012971 seems to be a follow up to this bug
Hi amobrem, amobrem, kdoberst, Wondering if the BZ is still considered 'severity=High' meaning it should be treated as a high SLO canidate? In light of new HAC/OCM priorities and the additional work needed to complete this bug (the ability for operator to opt. out of deleting operands); should this be a 'severity=medium' or perhaps an RFE bug?
Please keep the severity=high. As an owner of a complex operator, I would like to opt out of this feature - Console (or anyone) should not delete anything in openshift-cnv but the top-level CR. I would like to annotate my CSV so that the option to delete parts of the resources is not exposed in UI. Please share with us the opt-out annotation as soon as you can, because we have only one sprint for our own API freeze.
I put this comment in the Console PR related to this bug, but I wanted to capture it here as well. This is the context for what the OLM team thinks is missing (allowing operator teams to opt-out of operand deletion): > The OLM team decided to stick with the original decision of using an annotation in the CSV to allow operator versions to opt-out of operand deletion. The primary reason for this decision was to make it as easy as possible for operator authors to set the opt-out. Using only operator conditions would force operator authors to make code changes. This decision does not preclude further discussion around allowing dynamic opt-out via operator conditions. The annotation and operator condition mechanisms are not mutually exclusive. > > The chosen annotation name is "console.openshift.io/disable-operand-delete" The other important context for this bug is that operator teams will need to have time after this bug merges but before a GA release of OpenShift to be able to test the operand deletion and opt-out functionality. If this doesn't land before the OpenShift feature freeze date, it will likely need to be punted to the following release.
@jhadvig since Dave is off on ocm land, can we reassign this to an ocp team member and assess whether we can get this into 4.11?
*** Bug 2042268 has been marked as a duplicate of this bug. ***
This issue is being tracked by https://issues.redhat.com/browse/CONSOLE-3237 PR for fixing is up - https://github.com/openshift/console/pull/12234 Closing...