Bug 1894897

Summary: [v2v][VMIO] VMimport CR is not reported as failed when target VM is deleted during the import
Product: Container Native Virtualization (CNV) Reporter: Maayan Hadasi <mguetta>
Component: V2VAssignee: Sam Lucidi <slucidi>
Status: CLOSED ERRATA QA Contact: Amos Mastbaum <amastbau>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.5.0CC: amastbau, cnv-qe-bugs, fdupont, istein, slucidi
Target Milestone: ---   
Target Release: 2.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-10 11:18:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vm-import-controller yaml
none
vmware-vmimport-1-describe
none
vm-import-controller log
none
importer-vmware-import-1-harddisk1 pod log none

Description Maayan Hadasi 2020-11-05 11:51:11 UTC
Description of problem:
VMware vmimport CR has no "failed" status as expected after deleting the importing VM during disk copy/conversion image stage
It seems in UI with status "importing" 


Version-Release number of selected component (if applicable):
CNV 2.5.0-413 (iib-24150)
OCP 4.6.1


How reproducible:
100%


Steps to Reproduce:
1. Have a running VM in VMware
2. Create VMimport CR via API
3. After source VM is powered off and the disk copy was started -> delete the importing VM: oc delete vm <vm-name>


Actual results:
VMimport status is "Processing"
UI: VMimport is displayed in VM page with status "importing"
VMimport stays in this situation till deleting VMimport CR


Expected results:
VMimport should have status "failed" due to VM deletion


Additional info:
* Tested using NFS storage class
* Regarding pods created during import:
- Deleting VM in disk_copy stage -> import pod still running, at some point it is removed (as it was completed) but no vmimport.v2v.kubevirt pod is created after.
- Deleting VM in conversion stage -> vmimport.v2v.kubevirt pod still running and got completed
* Source VM is off till deleting the VMimport CR itself


Attachments:
logs: vm-import-controller, importer pod
vm-import-controller yaml
vmimport CR describe

Comment 1 Maayan Hadasi 2020-11-05 11:52:19 UTC
Created attachment 1726847 [details]
vm-import-controller yaml

Comment 2 Maayan Hadasi 2020-11-05 11:53:01 UTC
Created attachment 1726848 [details]
vmware-vmimport-1-describe

Comment 3 Maayan Hadasi 2020-11-05 11:53:51 UTC
Created attachment 1726849 [details]
vm-import-controller log

Comment 4 Maayan Hadasi 2020-11-05 11:55:27 UTC
Created attachment 1726850 [details]
importer-vmware-import-1-harddisk1 pod log

Comment 5 Fabien Dupont 2020-11-06 08:04:29 UTC
In the vm-import-controller log, we can see the following message:

{"level":"error","ts":1604566865.2775548,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"virtualmachineimport-controller","name":"vmware-vmimport-1","namespace":"default","error":"VirtualMachine.kubevirt.io \"vmware-import-1\" not found","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

So, the vm-import-controller knows that the VM has been deleted. It could then delete the DataVolume and mark the import as failed with a meaningful message.
When the DataVolume is deleted, I guess that the importer pod is terminated too. Something to verify.

Comment 6 Maayan Hadasi 2020-11-08 08:31:24 UTC
(In reply to Fabien Dupont from comment #5)
> In the vm-import-controller log, we can see the following message:
> 
> {"level":"error","ts":1604566865.2775548,"logger":"controller-runtime.
> controller","msg":"Reconciler
> error","controller":"virtualmachineimport-controller","name":"vmware-
> vmimport-1","namespace":"default","error":"VirtualMachine.kubevirt.io
> \"vmware-import-1\" not
> found","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/
> github.com/kubevirt/vm-import-operator/vendor/github.com/go-logr/zapr/zapr.
> go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).
> reconcileHandler\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/
> sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:
> 248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).
> processNextWorkItem\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/
> sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:
> 222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).
> worker\n\t/go/src/github.com/kubevirt/vm-import-operator/vendor/sigs.k8s.io/
> controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/
> apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/kubevirt/
> vm-import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.
> io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/kubevirt/vm-
> import-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/
> apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/kubevirt/vm-import-
> operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
> 
> So, the vm-import-controller knows that the VM has been deleted. It could
> then delete the DataVolume and mark the import as failed with a meaningful
> message.
> When the DataVolume is deleted, I guess that the importer pod is terminated
> too. Something to verify.

It seems as the importer pod is terminated too. Please see "Additional info" in bug description

Comment 7 Fabien Dupont 2020-11-18 15:27:31 UTC
It is not related to a specific provider. Marking BZ#1894900 as duplicate to reduce admin work.

Comment 8 Fabien Dupont 2020-11-18 15:27:51 UTC
*** Bug 1894900 has been marked as a duplicate of this bug. ***

Comment 9 Fabien Dupont 2021-01-25 10:20:28 UTC
@slucidi do you think this could be fixed in CNV 2.6.0? If not, do you think it's worth fixing it in CNV at all?

Comment 10 Sam Lucidi 2021-01-25 16:13:33 UTC
I think I'll have time to fix it for 2.6.

Comment 11 Fabien Dupont 2021-01-28 16:56:05 UTC
The fix should be in hco-bundle-registry-container-v2.6.0-521 and onwards. Moving to ON_QA.

Comment 12 Amos Mastbaum 2021-02-04 10:17:09 UTC
verified build: iib-42945 hco-v2.6.0-523
ovirt+vmware

VMNotFound: target VM XXX-for-tests not found

Comment 13 Amos Mastbaum 2021-02-04 10:24:15 UTC
verified build: iib-42945 hco-v2.6.0-523
ovirt+vmware

VMNotFound: target VM XXX-for-tests not found

Comment 16 errata-xmlrpc 2021-03-10 11:18:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799