1902742 – Migration plan custom resource requires an initial validation to catch errors not caused by the VM Import Controller

Bug 1902742 - Migration plan custom resource requires an initial validation to catch errors not caused by the VM Import Controller

Summary: Migration plan custom resource requires an initial validation to catch errors...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Migration Toolkit for Virtualization
Classification:	Red Hat
Component:	General
Sub Component:
Version:	2.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	2.0.0
Assignee:	Jeff Ortel
QA Contact:	Ilanit Stein
Docs Contact:	Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-30 14:37 UTC by Ilanit Stein
Modified:	2021-06-10 17:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-06-10 17:11:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
migration.yaml (21.84 KB, text/plain) 2021-01-21 11:42 UTC, Ilanit Stein	no flags	Details
plan.yaml (8.30 KB, text/plain) 2021-01-21 11:42 UTC, Ilanit Stein	no flags	Details
vm import failure (43.69 KB, image/png) 2021-02-03 17:08 UTC, Ilanit Stein	no flags	Details
Migration plan/VM failure (73.04 KB, image/png) 2021-02-03 17:09 UTC, Ilanit Stein	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2021:2381	0	None	None	None	2021-06-10 17:11:39 UTC

Description Ilanit Stein 2020-11-30 14:37:48 UTC

Description of problem:
MTV Migration plan with a single source VMware VM with 2 networks (VM Network, Mgmt Network), mapped to POD. VM import failed with error:

"
Import error (RHV)
mini-rhel7-istein could not be imported.
DataVolumeCreationFailed: Error while importing disk image: . VirtualMachine.kubevirt.io "" not found
"

This error do not indicate the problem:
2 source networks mapped to POD. 

Version-Release number of selected component (if applicable):
CNV-2.5

Comment 1 Ilanit Stein 2020-11-30 16:18:41 UTC

This bug bug seem to be related to bug 1891440.

Comment 2 Fabien Dupont 2020-12-10 21:25:41 UTC

Is the same behavior observed when creating a VMImport directly?

Comment 3 Ilanit Stein 2021-01-21 11:42:06 UTC

Created attachment 1749366 [details]
migration.yaml

Comment 4 Ilanit Stein 2021-01-21 11:42:08 UTC

This bug was reported for CNV-2.5.

Testing it on CNV-2.6.0/HCO-502:

1. VM import in such a flow fail now on:

Import error (VMware)
v2v-rh8-2disks2nics could not be imported.
VMCreationFailed: Error while creating virtual machine openshift-migration/v2v-rh8-2disks2nics: admission webhook "virtualmachine-validator.kubevirt.io" denied the request: more than one interface is connected to a pod network in spec.template.spec.interfaces

2. Running it by MTV plan, cause VM import fail the same, but when checking the migration plan it is displayed as failed, with an option to Restart, but There is no error message saying why it failed.

Main problem here is that the VM import error is not propagated to the migration plan UI.

Attaching the MTV migration/plan CRs

Comment 5 Ilanit Stein 2021-01-21 11:42:36 UTC

Created attachment 1749367 [details]
plan.yaml

Comment 6 Ilanit Stein 2021-01-24 10:10:44 UTC

Based on comment #4 moving bug to MTV, to handle the VM import error propagation to MTV migration ui.

Comment 7 Mike Turley 2021-01-28 14:25:20 UTC

@Ilanit,

Can you provide an MTV installation with a plan in this state that I can inspect?

It is expected that error details wouldn't be on the plans page (the same place where the Restart button is), you have to click through to the VM migration details for that plan to see errors on a per-VM level. Are the error still not appearing at that level either?

Comment 8 Ilanit Stein 2021-02-01 15:16:28 UTC

I have reproducer env that I can share.

Looking inside the VM in the migration plan - The error is indeed displayed.
However it is displayed as such that failed the conversion stage, while the conversion stage was never reached.
See "Migration plan/VM failure" screenshot attached.
The VM import failed immediately - see "vm import failure" screenshot attached.
The importer and v2v pods were not run.

Comment 9 Mike Turley 2021-02-02 17:44:13 UTC

@istein I'm not seeing an attached screenshot, but I believe you :)

If an error is being reported at the wrong pipeline step, that's a controller issue. The UI simply displays whatever pipeline status the API provides. cc @jortel

Comment 10 Ilanit Stein 2021-02-03 17:08:50 UTC

Created attachment 1754819 [details]
vm import failure

Comment 11 Ilanit Stein 2021-02-03 17:09:56 UTC

Created attachment 1754820 [details]
Migration plan/VM failure

Comment 12 Mike Turley 2021-02-03 17:13:59 UTC

Thanks for the screenshots Ilanit.

@Jeff if you look at that screenshot "Migration plan/VM failure", is that error appearing on the wrong pipeline step or is it working as intended?

Comment 13 Jeff Ortel 2021-02-04 16:16:45 UTC

(In reply to Mike Turley from comment #12)
> Thanks for the screenshots Ilanit.
> 
> @Jeff if you look at that screenshot "Migration plan/VM failure", is that
> error appearing on the wrong pipeline step or is it working as intended?

Based on the screenshot, the error is reported against the image-conversion step which seems correct to me.

Comment 14 Ilanit Stein 2021-02-11 12:41:55 UTC

@Jeff,

In "Migration plan/VM failure" attachment we see failure is displayed in the conversion step.
However, the import fails even before the "importer" & " conversion" stages.
If we would have an "init" stage displayed within migration ui, it would have belonged there.
Do we have plans to add it for GA?

Comment 15 Ilanit Stein 2021-02-11 13:42:42 UTC

After discussing this with Fabien Dupont, it seems that it actually failed at the importer stage, and not in "initial" stage.
vmio reported wrongly the stage was "conversion", while the actual stage was "disk copy" (importer). 
Need to report a separate vmio bug for this.


This bug is actually on that the error is not being displayed in the migration UI - and that part works OK on latest MTV migration ui. Therefore closing on not a bug.

Comment 16 Fabien Dupont 2021-02-11 14:21:15 UTC

This indeed happens at the "CreateImport" stage, even before the DiskTransfer stage that is shown in the UI.

Here, the VMImport CR is rejected by the admission webhook, so the error should be "Could initiate VM import process", or similar.

@jortel, would it be possible to add a "Initialize" (or other name) stage for whatever is done before the VMImport CR is created and retrieved by the Migration CR ?

Comment 17 Jeff Ortel 2021-02-15 14:57:14 UTC

(In reply to Fabien Dupont from comment #16)
> This indeed happens at the "CreateImport" stage, even before the
> DiskTransfer stage that is shown in the UI.
> 
> Here, the VMImport CR is rejected by the admission webhook, so the error
> should be "Could initiate VM import process", or similar.
> 
> @jortel, would it be possible to add a "Initialize" (or other
> name) stage for whatever is done before the VMImport CR is created and
> retrieved by the Migration CR ?

Depends. 
If we plan to absorb the VMIO functionality into MTV for GA then the failure will be dealt with and reported naturally.
If not, Yes, we could add a step in the pipeline to align/report this failure.

Comment 18 Fabien Dupont 2021-02-15 15:26:51 UTC

Even if we absorb VMIO functionality, I think it makes sense to have a first stage for initialization/validation. The controller will always have to do something before triggering data transfer and that would allow reporting errors that happen in the very early times of the migration.

Comment 19 Jeff Ortel 2021-03-29 15:13:24 UTC

Seems like this particular bug: the v2v-rh8-2disks2nics image is not in the registry on the destination cluster) should be caught and reported during provider validation.  I propose we add a validation/condition of the Provider CR that checks that the image is available.
As for the new initialization/validation step in the pipeline for each VM, it's not clear to me what this step would provide beyond the contextual validations we discussed but cannot remember ATM.  Not opposed to adding the step but would to do it in the context of a specific use case that I think is separate from this BZ.

Comment 20 Fabien Dupont 2021-04-07 07:21:04 UTC

*** Bug 1902028 has been marked as a duplicate of this bug. ***

Comment 21 Fabien Dupont 2021-04-07 11:19:02 UTC

I agree with Jeff. For this specific error, the message should now be correctly displayed.

For the additional "validation" step, we'll track the implementation in BZ 1902028.

Moving to ON_QA.

Comment 22 Amos Mastbaum 2021-04-26 10:19:00 UTC

MTV 2.0.0-20 (iib:69034)

HCO image: registry.redhat.io/container-native-virtualization/hyperconverged-cluster-operator@sha256:79bb6a11201f2b867b7a928220dfe628ad28acd2e70fe35933dbe115f8ce1b23
HCO operator version: v2.6.2-1
HCO index image: registry-proxy.engineering.redhat.com/rh-osbs/iib:67945
CNV version: 2.6.2
CSV creation time: 2021-04-17 03:38:06
KubeVirt: v0.36.2-22-g95642d8
CDI: v1.28.3-18-g2acbfb2
OpenShift: 4.7.7
OCS: 4.7.0-353.ci
OpenShift networkType: OpenShiftSDN

Comment 25 errata-xmlrpc 2021-06-10 17:11:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (MTV 2.0.0 images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:2381

Note You need to log in before you can comment on or make changes to this bug.