Description of problem: VM import from VMware of any VM (tested: RHEL7, Windows2016) fail after running for ~20 minutes, on remote host certificate. This happened on 2 different CNV-2.4 environments, and for 2 different VMware providers. v2v-log error messages: error: [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: The remote host certificate has these problems:\n' error: * self signed certificate in certificate chain\n' 2020-07-12 18:07:39,902 - root - DEBUG - b"nbdkit: vddk[3]: debug: NBD_ClientOpen: Couldn't connect to <Vmware Esxi>:902 The remote host certificate has these problems:\n" debug: VixDiskLib: Detected DiskLib error 2338 (NBD_ERR_NETWORK_CONNECT).\n' 2020-07-12 18:07:39,904 - root - DEBUG - b'nbdkit: vddk[3]: debug: VixDiskLib: VixDiskLibQueryBlockList: Fail to start query process. Error 14009 (The server refused connection) (DiskLib error 2338: NBD_ERR_NETWORK_CONNECT) at 543.\n' Version-Release number of selected component (if applicable): vddk-7.0 or vddk-6.5 CNV-2.4 from July 12 2020 OpenShift Version 4.5.0 Kubernetes Version v1.18.3+3415b61 kubevirt-v2v-conversion versions: virt-v2v-1.40.2-22.module+el8.2.0+6029+618ef2ec.x86_64 libguestfs-1.40.2-22.module+el8.2.0+6029+618ef2ec.x86_64 libvirt-client-6.0.0-17.2.module+el8.2.0+6629+3fc0f2c2.x86_64
Created attachment 1700744 [details] kubevirt-v2v-conversion-v2v-rhel7-vm.log
It's quite hard to follow the virt-v2v output but I think the error is (from VDDK): 2020-07-12 18:30:21,529 - root - DEBUG - b'nbdkit: vddk[3]: debug: Cnx_Connect: Error message: Host address lookup for server f02-h26-000-r620.rdu2.scalelab.redhat.com failed: Name or service not known\n' 2020-07-12 18:30:21,529 - root - DEBUG - b'nbdkit: vddk[3]: error: [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: Host address lookup for server f02-h26-000-r620.rdu2.scalelab.redhat.com failed: Name or service not known\n' which would indeed indicate some kind of network problem. I think that is the source ESXi server?
since this is a regression, and we do not allow regressions between releases, im setting the target to 2.4 and will suggest blocker
On same CNV environment, the VM imports that were failing yesterday, are now in status "off" - That is there was an automatic retry, and they were migrated eventually successfully. Also, I just ran VM import of some other RHEL7 VM successfully. I will try again to run VM import for the same rhel7/win2016 VMs, that were failing before, and update here.
Tried again same VMs, and now it's failing for me on the same reported error. Might be that there is a network instability issue here. In CNV UI the status shown is: The virtual machine could not be imported. Terminated with Error (exit code 2). and link to pod logs is provided. opening the pod logs show the failure reported in this bug.
If needed I can provide CNv environment details, where problem reproduces.
Tested 10 VM imports from VMware to CNV BM, and could not reproduce this bug. @Brett, Also managed to import from the Vmware you provided same VM ("fdupont-test-migration (rhel7 with 2 disks) that I could not import on the PSI environment before, from some reason.
Closing this BZ per QE as it wasn't reproducible on a stable env.