Bug 1745998

Summary: VMs and DVs (user-data) is getting deleted during HCO uninstall
Product: Container Native Virtualization (CNV) Reporter: Asher Shoshan <ashoshan>
Component: VirtualizationAssignee: Roman Mohr <rmohr>
Status: CLOSED ERRATA QA Contact: zhe peng <zpeng>
Severity: high Docs Contact:
Priority: high    
Version: 2.1.0CC: cnv-qe-bugs, ipinto, msluiter, rmohr, sgordon, sgott, stirabos
Target Milestone: ---   
Target Release: 2.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v2.2.0-445 virt-operator-container-v2.3.0-33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 19:10:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Asher Shoshan 2019-08-27 12:34:45 UTC
Description of problem:

When deleting the cr hyperconverged-cluster of kind HyperConverged, causes cascade deletion of kubevirt-hyperconverged-cluster cr, and all meta-operators (virt-handler, virt-api,..) also deletes all current running VM's.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

should the VM be kept alive, and accessible?
Or is it the action of removing the whole Kubevirt product, and it's by-products?


Additional info:

Comment 1 Fabian Deutsch 2019-08-27 12:38:02 UTC
It's a design.
But I wonder if we should add some safety measures here.

I.e. Only permit to delete HCO CR
- (or KubeVirt CR) if there are no VMs.
- if there are no DataVolumes.

Steve, thoughts?

Comment 2 Asher Shoshan 2019-08-27 12:54:58 UTC
Kubevirt cr is protected (with finalizer), when trying to delete it;
But on the other hand, it's so easy to delete it by deleting it's owner..

Comment 3 sgott 2019-08-30 20:34:12 UTC
In regards to Comment #2, does "owner" refer to the account that created the KubeVirt CR? The operator (in this case HCO) that created it?

Comment 4 Marc Sluiter 2019-09-02 08:30:15 UTC
> Kubevirt cr is protected (with finalizer), when trying to delete it

the finalizer on KubeVirt's CR is (currently) only used for ensuring that all KubeVirt components are deleted before the CR is deleted. It does not prevent deletion.

Comment 5 Stephen Gordon 2019-09-03 22:08:49 UTC
(In reply to Fabian Deutsch from comment #1)
> It's a design.
> But I wonder if we should add some safety measures here.
> 
> I.e. Only permit to delete HCO CR
> - (or KubeVirt CR) if there are no VMs.
> - if there are no DataVolumes.
> 
> Steve, thoughts?

I do think we need to protect the user, but I also think this might be a bit painful for the case where they really want to remove it and now have to manually go and clean up all the objects first? Is there a way to basically make them do a --force or "Are you really sure?" in all cases?

Comment 6 Fabian Deutsch 2019-09-04 12:30:50 UTC
Federico had a good point: Only an operator can uninstall components. And it's as easy to uninstalL OCS as it is to uninstall CNV.

To me these two points are strong enough to say that we can defer a pragmatic solution to 2.1.1.

A pragmatic solution: KubeVirt can not be uninstalled as long as any VM or VMI is defined.

Comment 11 sgott 2019-10-23 13:44:34 UTC
The solution to this is likely a 2-part fix. virt-operator does not own the CR, so attempting to prevent deletion is tricky. However, KubeVirt can definitely check for the existence of VMs/DVs and create a condition. This would make for a cleaner API and separation of responsibility. Whatever entity created the KubeVirt CR can monitor for that condition and act accordingly (or optionally ignore it).

Comment 12 Asher Shoshan 2019-10-23 15:22:55 UTC
It's not the CR (kind kubevirt).. when this cr is deleted, then all owned resources are cascade-deleted (virt-handler, virt-api, virt-controller)

All resources such as vm's, vmi's, created by the user - are not owned by this cr.

virt-operator explicitly deletes all user created resources when the kubevirt-kind cr is deleted - why?
(I can still work with the vm, once the virt-handler, virt-api, etc are recreated)

Comment 13 Roman Mohr 2020-01-21 13:11:03 UTC
We merged just recently https://github.com/kubevirt/kubevirt/pull/2976 in kubevirt/kubevirt. It allow setting a new field (spec.uninstallStrategy) in the KubeVirt  CR to the value "BlockUninstallIfWorkloadsExist".

When a user then tries to delete the KubeVirt CR when workloads (VM, VMI, VMIRS) still exist, the deletion is blocked on the webhook level (so no deletion timestamp set).
HCO will have to pick it up and set it explicitly, since we did not want to change the default behaviour yet upstream.

Simone I guess we want to handle the update in HCO in the context of this bugzilla?

Comment 14 sgott 2020-02-19 17:20:33 UTC
Roman,

Does this require a PR in HCO? Is that already done?

Comment 15 Roman Mohr 2020-02-19 17:22:28 UTC
Yes. For kubevirt the PR was done here: https://github.com/kubevirt/hyperconverged-cluster-operator/pull/454

Comment 18 zhe peng 2020-03-13 06:06:04 UTC
verify with build:
$ oc version
Client Version: 4.4.0-0.nightly-2020-02-17-022408
Server Version: 4.4.0-0.nightly-2020-03-06-170328
Kubernetes Version: v1.17.1

step:
1 deploy cnv
2 create a dv and vm
3 check kv defalut value
$ oc get kv kubevirt-kubevirt-hyperconverged -o yaml
.....
spec:
  uninstallStrategy: BlockUninstallIfWorkloadsExist
.....
4 try to delete kv
Error from server: admission webhook "kubevirt-validator.kubevirt.io" denied the request: Rejecting the uninstall request, since there are still Virtual Machine Instances present. Either delete all KubeVirt related workloads or change the uninstall strategy before uninstalling KubeVirt.

change strategy to
spec:
    uninstallStrategy: RemoveWorkloads
delete kv again
$ oc delete kv kubevirt-kubevirt-hyperconverged
kubevirt.kubevirt.io "kubevirt-kubevirt-hyperconverged" deleted
check vm is removed
$ oc get vm
No resources found in openshift-cnv namespace.

move to verified

Comment 21 errata-xmlrpc 2020-05-04 19:10:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011