Description of problem: VM import from RHV of a Fedora 32 VM with OS type RHEL8, to target "standard" storage class. While the VM's disk was copied, VM import of the same VM, to target NFS storage class. Result: The first VM import was displayed in UI as if it was stuck @85%. The second VM import failed: At first it failed on disk being locked, and then, after few minutes, it got this other failure: " The virtual machine could not be imported. DataVolumeCreationFailed: Data volume default/fedora32-nfs-b870c429-11e0-4630-b3df-21da551a48c0 creation failed: Internal error occurred: failed calling webhook "datavolume-mutate.cdi.kubevirt.io": Post https://cdi-api.openshift-cnv.svc:443/datavolume-mutate?timeout=30s: no endpoints available for service "cdi-api" " Piotr Kliczewski: I checked your environment and found one importer pod (importer-fedora32-1-b870c429-11e0-4630-b3df-21da551a48c0) in Terminating status: I see a message there: message: 'Unable to connect to imageio data source: Fault reason is "Operation Failed". Fault detail is "[Cannot transfer Virtual Disk: The following disks are locked: GlanceDisk-f6c31e5. Please try again in a few minutes.]". HTTP response code is "409". HTTP response message is "409 Conflict".' I suspect it failed due to the same disk being imported already. The DV (fedora32-1-b870c429-11e0-4630-b3df-21da551a48c0) is still in: phase: ImportInProgress progress: 100.00% Corresponding VMImport contains a message: kind: VirtualMachineImport metadata: annotations: vmimport.v2v.kubevirt.io/progress: "85" vmimport.v2v.kubevirt.io/source-vm-initial-state: down name: vm-import-fedora32-1-xfhww .... dataVolumes: - name: fedora32-1-b870c429-11e0-4630-b3df-21da551a48c0 Interesting point is that the second import failed to create DV. kind: VirtualMachineImport name: vm-import-fedora32-nfs-p22xd ... message: 'Data volume default/fedora32-nfs-b870c429-11e0-4630-b3df-21da551a48c0 creation failed: Internal error occurred: failed calling webhook "datavolume-mutate.cdi.kubevirt.io": Post https://cdi-api.openshift-cnv.svc:443/datavolume-mutate?timeout=30s: no endpoints available for service "cdi-api"' reason: DataVolumeCreationFailed Further steps on this same environment: VM import a RHEL-7 VM, from RHV to CNV, using default storage, which is standard (default) - kubernetes.io/cinder. The VM import seemed stuck in UI on 10% progress. In UI there was NO indication of why it is not progressing. There were errors in the vm-import-controller log like: "ovirt client panicked: runtime error: invalid memory address or nil pointer dereference" I then tried to import another RHEL-7 VM to NFS storage. That VM import finished successfully. I then deleted this VM, and tried to VM import it again to NFS storage - That failed with import error, related to ovirt. I think the error was something like: "invalid memory address or nil pointer dereference" in UI. I removed this VM import resource, and tried again to run VM import from RHV, but that gets stuck on "Checking RHV API credentials" stage forever. That is, it is no longer possible to do VM import from RHV. Checking VM import from VMware provider, that worked before the above steps: For existing/new VMware provider, It is "stuck" in "Checking vCenter credentials" stage. Version-Release number of selected component (if applicable): CNV-2.4 from July 9 2020. Additional info: standard (default) kubernetes.io/cinder: This comes by default with OCP4.4+, It should provision columns on top of OpenStack Platform (RHOSP) Cinder. In our case it is not functional because it requires configuration with openstack which we don't do at the moment.
Created attachment 1700746 [details] vm-import-controller.log
Ilanit, There is no enough infomation in the provided log to understand why the client panicked. Please use quay.io/pkliczewski/vm-import-controller:latest with updated debug statement to reproduce the issue.
@Ilanit, recently we see DataVolumeCreationFailed failure to occur more often. It seems not to be related to to cinder.
Based on the issue we know with naming length limits it seems to be related to BZ #1857165. Here is the reason: "error occurred: failed calling webhook "datavolume-mutate.cdi.kubevirt.io"
@Piotr, I had to redeploy as this PSI environment became not usable and not accessible. I will try to reproduce the issues mentioned in the bug description. Testing "standard" storage VM import solely show that is not actually reaching the copy disk stage. Importer pod is pending PVC bound forever.
I understand that the flow is different but the issue should be fixed by behaviour described in BZ #1857784
As part of this bug we have improved error message in log, so it's clear to the user, that the RHV instance is down, and we can't connect. Other issues in this bug are related to bug 1857784 and they should be verified there.
Please add fixed in version
Tried to verify on CNV-2.5 from osbs on Sep 30, 2020. Ovirt panic was not reproduced. Moving to verified upon that the debug messages were added, and if this ovirt panic reproduces, hopefully we'll have more debug information.