Created attachment 1736957 [details] Emergency mode screenshot Description of problem: Migration plan with a single RHEL7/8 VM from VMware works, but the VM is started in "Emergency mode". Same RHEL7/8 VM, imported by the single VM import from VMware via api (using vmio) is resulted with imported VM that starts just fine (not emergency mode). This is with internal or external mapping. Version-Release number of selected component (if applicable): MTV-2.0 How reproducible: 100% Tried it on couple of VMs and from 2 VMware setups.
Created attachment 1736959 [details] MTV-vm-import-CR.yaml
Created attachment 1736960 [details] vm-import-via-api.yaml
Can you please provide the VM CR in both cases?
Both CRs look normal to me, and I have not been able to reproduce this issue with my own RHEL8 VM. It's possible that perhaps the resource mappings are wrong in some way, but it looks like the source VM has been deleted so I am unable to check. Ilanit, can you provide a reproducer environment?
We Suspect it might happen only when target VM namespace is NOT default
I can confirm the issue is reproducible when target VM namespace is something other than the default namespace, regardless of using VMIO directly or with MTV. We Only tested in PSI. This VMIO not MTV issue.
I suspect that it comes from the issue with ftruncate. We'll wait for BZ#1913756 to be fixed and try to reproduce.
Note there is no issue when importing from RHV.
This bug occurs on CNV-2.5.3 too. Was tested using a RHEL7 VM import, to a default namespace Vs to a non "default" namespace. The import to the "non default" had the importing VM stating in emergency mode, while for the "default", the imported VM started OK. Therefore, cloning it to 2.5.z.
Does this reproduce with storage classes other than NFS? I've been attempting to reproduce on my local cluster with the default cinder storage class, but it always imports successfully.
We can confirm this doesn't happen with chephs/Filesystem (tested on both PSI & BM)
@Fabien, This bug is marked as Depends on bug 1913756. However, bug 1913756 is verified on CNV-2.6 hco-500, and yet this bug still remained the same on this same version. Can we remove the dependency on bug 1913756? Thanks.
Created attachment 1749095 [details] v2v-virt.log
Ilanit, can we please have the following: pvc yaml (oc get pvc pvc_name -oyaml) data volume yaml (oc get dv dv_name -oyaml) importer pod log (oc logs -f importer_pod_name)
Created attachment 1749114 [details] v2v-virt-full.log
Created attachment 1749116 [details] good-v2v-virt.log
We see file open Permission issue at the end of v2v log for qemu-img " Conversion successful. Committing all overlays to local disks. (0.00/100%) qemu-img: Could not reopen file: Permission denied Commit successful. Cleaning up. "
I think this error comes from our virt-v2v-cdi entrypoint. It fails to commit the virt-v2v overlays to the base disk image, but counts it as successful anyway. This should probably fail the import, because the resulting VM will not have virtio drivers in the initramfs. I'm not sure why there is an NFS permissions issue on non-default namespaces, but that can be a separate bug.
It reminds me of previous errors in Single VM Import. IIRC, the solution was to run the pod in privileged mode. It's possible that the default namespace has a different RBAC. As a short term fix, we could force the pod to run in privileged mode. @istein @amastbau when did this started to happen? The code responsible for this has not changed in the last 3 months, so I would expect it would have been caught earlier.
@marnold the entrypoint code is here: https://github.com/kubevirt/vm-import-operator/blob/master/build/virtv2v/bin/entrypoint. Do you think you can make it more reliable?
At a minimum I can make the script fail when qemu-img fails to commit the overlays, so it will not report a successful import. I am testing changes for this right now. I'm not sure I can do anything about NFS permissions from the entrypoint though.
Fabien, we see it at least from cnv 2.5.2 We did not notice the emergency mode in the regressions run as automation was not looking at it. Enhanced it now. So indeed it can be that this bug introduced some months ago.
> qemu-img: Could not reopen file: Permission denied Can confirm that this message isn't coming from virt-v2v. qemu-img ought to exit with an error when this happens, so whatever runs qemu-img should check for that exit code. Unrelated, but that error message was removed upstream recently: https://github.com/qemu/qemu/commit/b18a24a9f889bcf722754046130507d744a1b0b9#diff-7ada4d307c3081b49c8044bba958219d3fa6cbb513d80ac9de506980c69d741b
Created attachment 1749329 [details] storage logs for import to default/non default namespace
https://github.com/kubevirt/vm-import-operator/pull/466
This downstream images are part of hco-bundle-registry-container-v2.6.0-520 and onward. Moving to ON_QA.
Verified on Bare metal OCP-4/7CNV iib-42387 hco-v2.6.0-520
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799