Bug 1904797 - [VMIO][vmware] A migrated RHEL/Windows VM starts in emergency mode/safe mode when target storage is NFS and target namespace is NOT "default"
Summary: [VMIO][vmware] A migrated RHEL/Windows VM starts in emergency mode/safe mode ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: V2V
Version: 2.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.6.0
Assignee: Sam Lucidi
QA Contact: Daniel Gur
URL:
Whiteboard:
Depends On: 1913756
Blocks: 1916333
TreeView+ depends on / blocked
 
Reported: 2020-12-06 13:50 UTC by Ilanit Stein
Modified: 2021-03-10 11:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1916333 (view as bug list)
Environment:
Last Closed: 2021-03-10 11:19:48 UTC
Target Upstream Version:
Embargoed:
istein: needinfo+
istein: needinfo+


Attachments (Terms of Use)
Emergency mode screenshot (72.72 KB, image/png)
2020-12-06 13:50 UTC, Ilanit Stein
no flags Details
MTV-vm-import-CR.yaml (3.22 KB, text/plain)
2020-12-06 13:52 UTC, Ilanit Stein
no flags Details
vm-import-via-api.yaml (2.91 KB, text/plain)
2020-12-06 13:55 UTC, Ilanit Stein
no flags Details
v2v-virt.log (2.43 KB, text/plain)
2021-01-20 15:19 UTC, Amos Mastbaum
no flags Details
v2v-virt-full.log (898.27 KB, text/plain)
2021-01-20 16:23 UTC, Amos Mastbaum
no flags Details
good-v2v-virt.log (932.93 KB, text/plain)
2021-01-20 16:36 UTC, Amos Mastbaum
no flags Details
storage logs for import to default/non default namespace (13.28 KB, application/zip)
2021-01-21 09:18 UTC, Ilanit Stein
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:0799 0 None None None 2021-03-10 11:20:57 UTC

Description Ilanit Stein 2020-12-06 13:50:10 UTC
Created attachment 1736957 [details]
Emergency mode screenshot

Description of problem:
Migration plan with a single RHEL7/8 VM from VMware works, but the VM is started in "Emergency mode".

Same RHEL7/8 VM, imported by the single VM import from VMware via api (using vmio) is resulted with imported VM that starts just fine (not emergency mode).  
This is with internal or external mapping.

Version-Release number of selected component (if applicable):
MTV-2.0

How reproducible:
100%
Tried it on couple of VMs and from 2 VMware setups.

Comment 1 Ilanit Stein 2020-12-06 13:52:37 UTC
Created attachment 1736959 [details]
MTV-vm-import-CR.yaml

Comment 2 Ilanit Stein 2020-12-06 13:55:17 UTC
Created attachment 1736960 [details]
vm-import-via-api.yaml

Comment 3 Fabien Dupont 2020-12-10 20:56:01 UTC
Can you please provide the VM CR in both cases?

Comment 4 Sam Lucidi 2020-12-11 19:06:28 UTC
Both CRs look normal to me, and I have not been able to reproduce this issue with my own RHEL8 VM. It's possible that perhaps the resource mappings are wrong in some way, but it looks like the source VM has been deleted so I am unable to check. Ilanit, can you provide a reproducer environment?

Comment 7 Amos Mastbaum 2021-01-04 13:40:09 UTC
We Suspect it might happen only when target VM namespace is NOT default

Comment 8 Amos Mastbaum 2021-01-05 13:28:45 UTC
I can confirm the issue is reproducible when target VM namespace is something other than the default namespace, regardless of using VMIO directly or with MTV.
We Only tested in PSI.
This VMIO not MTV issue.

Comment 10 Fabien Dupont 2021-01-08 13:40:32 UTC
I suspect that it comes from the issue with ftruncate. We'll wait for BZ#1913756 to be fixed and try to reproduce.

Comment 12 Amos Mastbaum 2021-01-13 15:57:08 UTC
Note there is no issue when importing from RHV.

Comment 13 Ilanit Stein 2021-01-14 14:36:57 UTC
This bug occurs on CNV-2.5.3 too.
Was tested using a RHEL7 VM import, to a default namespace Vs to a non "default" namespace.
The import to the "non default" had the importing VM stating in emergency mode,
while for the "default", the imported VM started OK. 
Therefore, cloning it to 2.5.z.

Comment 14 Sam Lucidi 2021-01-14 20:21:54 UTC
Does this reproduce with storage classes other than NFS? I've been attempting to reproduce on my local cluster with the default cinder storage class, but it always imports successfully.

Comment 15 Amos Mastbaum 2021-01-19 12:15:35 UTC
We can confirm this doesn't happen with chephs/Filesystem (tested on both PSI & BM)

Comment 16 Ilanit Stein 2021-01-19 17:03:17 UTC
@Fabien,

This bug is marked as Depends on bug 1913756.
However, bug 1913756 is verified on CNV-2.6 hco-500, and yet this bug still remained the same on this same version.
Can we remove the dependency on bug 1913756?

Thanks.

Comment 18 Amos Mastbaum 2021-01-20 15:19:07 UTC
Created attachment 1749095 [details]
v2v-virt.log

Comment 19 Natalie Gavrielov 2021-01-20 16:15:37 UTC
Ilanit, can we please have the following:
pvc yaml (oc get pvc pvc_name -oyaml)
data volume yaml (oc get dv dv_name -oyaml)
importer pod log (oc logs -f importer_pod_name)

Comment 20 Amos Mastbaum 2021-01-20 16:23:20 UTC
Created attachment 1749114 [details]
v2v-virt-full.log

Comment 21 Amos Mastbaum 2021-01-20 16:36:36 UTC
Created attachment 1749116 [details]
good-v2v-virt.log

Comment 22 Daniel Gur 2021-01-20 16:48:30 UTC
We see file open  Permission  issue at the end of v2v log for qemu-img


"
Conversion successful. Committing all overlays to local disks.
    (0.00/100%)

qemu-img: Could not reopen file: Permission denied

Commit successful. Cleaning up.

"

Comment 23 Matthew Arnold 2021-01-20 17:25:57 UTC
I think this error comes from our virt-v2v-cdi entrypoint. It fails to commit the virt-v2v overlays to the base disk image, but counts it as successful anyway. This should probably fail the import, because the resulting VM will not have virtio drivers in the initramfs. I'm not sure why there is an NFS permissions issue on non-default namespaces, but that can be a separate bug.

Comment 24 Fabien Dupont 2021-01-20 21:06:42 UTC
It reminds me of previous errors in Single VM Import. IIRC, the solution was to run the pod in privileged mode. It's possible that the default namespace has a different RBAC.
As a short term fix, we could force the pod to run in privileged mode.

@istein @amastbau when did this started to happen? The code responsible for this has not changed in the last 3 months, so I would expect it would have been caught earlier.

Comment 25 Fabien Dupont 2021-01-20 21:13:21 UTC
@marnold the entrypoint code is here: https://github.com/kubevirt/vm-import-operator/blob/master/build/virtv2v/bin/entrypoint. Do you think you can make it more reliable?

Comment 26 Matthew Arnold 2021-01-20 21:19:02 UTC
At a minimum I can make the script fail when qemu-img fails to commit the overlays, so it will not report a successful import. I am testing changes for this right now. I'm not sure I can do anything about NFS permissions from the entrypoint though.

Comment 27 Daniel Gur 2021-01-21 07:56:28 UTC
Fabien, we see it at least from cnv 2.5.2
We did not notice the emergency mode in the regressions run as automation was not looking at it. Enhanced it now.
So indeed it can be that this bug introduced some months ago.

Comment 28 Richard W.M. Jones 2021-01-21 09:10:52 UTC
> qemu-img: Could not reopen file: Permission denied

Can confirm that this message isn't coming from virt-v2v.

qemu-img ought to exit with an error when this happens, so whatever runs qemu-img
should check for that exit code.

Unrelated, but that error message was removed upstream recently:

https://github.com/qemu/qemu/commit/b18a24a9f889bcf722754046130507d744a1b0b9#diff-7ada4d307c3081b49c8044bba958219d3fa6cbb513d80ac9de506980c69d741b

Comment 29 Ilanit Stein 2021-01-21 09:18:26 UTC
Created attachment 1749329 [details]
storage logs for import to default/non default namespace

Comment 31 Fabien Dupont 2021-01-28 08:17:10 UTC
This downstream images are part of hco-bundle-registry-container-v2.6.0-520 and onward. Moving to ON_QA.

Comment 32 Ilanit Stein 2021-01-28 20:17:48 UTC
Verified on Bare metal  OCP-4/7CNV iib-42387 hco-v2.6.0-520

Comment 35 errata-xmlrpc 2021-03-10 11:19:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799


Note You need to log in before you can comment on or make changes to this bug.