Bug 1856105

Summary: [v2v][VMware to CNV VM import] Any VM import from VMware fails on [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer.
Product: Container Native Virtualization (CNV) Reporter: Ilanit Stein <istein>
Component: V2VAssignee: Brett Thurber <bthurber>
Status: CLOSED NOTABUG QA Contact: Ilanit Stein <istein>
Severity: high Docs Contact:
Priority: high    
Version: 2.4.0CC: cnv-qe-bugs, dagur, ncredi, ptoscano, pvauter, rjones
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-16 18:21:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kubevirt-v2v-conversion-v2v-rhel7-vm.log none

Description Ilanit Stein 2020-07-12 18:57:35 UTC
Description of problem:
VM import from VMware of any VM (tested: RHEL7, Windows2016) fail after running for ~20 minutes, on remote host certificate.

This happened on 2 different CNV-2.4 environments, and for 2 different VMware providers.
 
v2v-log error messages:
error: [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: The remote host certificate has these problems:\n'

error: * self signed certificate in certificate chain\n'
2020-07-12 18:07:39,902 - root - DEBUG - b"nbdkit: vddk[3]: debug: NBD_ClientOpen: Couldn't connect to <Vmware Esxi>:902 The remote host certificate has these problems:\n"

debug: VixDiskLib: Detected DiskLib error 2338 (NBD_ERR_NETWORK_CONNECT).\n'
2020-07-12 18:07:39,904 - root - DEBUG - b'nbdkit: vddk[3]: debug: VixDiskLib: VixDiskLibQueryBlockList: Fail to start query process. Error 14009 (The server refused connection) (DiskLib error 2338: NBD_ERR_NETWORK_CONNECT) at 543.\n'

Version-Release number of selected component (if applicable):
vddk-7.0 or vddk-6.5
CNV-2.4 from July 12 2020
OpenShift Version 4.5.0
Kubernetes Version v1.18.3+3415b61
kubevirt-v2v-conversion versions: 
  virt-v2v-1.40.2-22.module+el8.2.0+6029+618ef2ec.x86_64
  libguestfs-1.40.2-22.module+el8.2.0+6029+618ef2ec.x86_64
  libvirt-client-6.0.0-17.2.module+el8.2.0+6629+3fc0f2c2.x86_64

Comment 1 Ilanit Stein 2020-07-12 19:00:19 UTC
Created attachment 1700744 [details]
kubevirt-v2v-conversion-v2v-rhel7-vm.log

Comment 3 Richard W.M. Jones 2020-07-13 09:44:57 UTC
It's quite hard to follow the virt-v2v output but I think the error is (from VDDK):

2020-07-12 18:30:21,529 - root - DEBUG - b'nbdkit: vddk[3]: debug: Cnx_Connect: Error message: Host address lookup for server f02-h26-000-r620.rdu2.scalelab.redhat.com failed: Name or service not known\n'
2020-07-12 18:30:21,529 - root - DEBUG - b'nbdkit: vddk[3]: error: [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: Host address lookup for server f02-h26-000-r620.rdu2.scalelab.redhat.com failed: Name or service not known\n'

which would indeed indicate some kind of network problem.  I think
that is the source ESXi server?

Comment 4 Nelly Credi 2020-07-13 09:52:43 UTC
since this is a regression, and we do not allow regressions between releases, im setting the target to 2.4
and will suggest blocker

Comment 5 Ilanit Stein 2020-07-13 11:59:25 UTC
On same CNV environment, the VM imports that were failing yesterday, are now in status "off" - That is there was an automatic retry, and they were migrated eventually successfully. 

Also, I just ran VM import of some other RHEL7 VM successfully.

I will try again to run VM import for the same rhel7/win2016 VMs, that were failing before,
and update here.

Comment 6 Ilanit Stein 2020-07-13 14:30:22 UTC
Tried again same VMs, and now it's failing for me on the same reported error.

Might be that there is a network instability issue here.

In CNV UI the status shown is:
The virtual machine could not be imported.
Terminated with Error (exit code 2).
and link to pod logs is provided.

opening the pod logs show the failure reported in this bug.

Comment 8 Ilanit Stein 2020-07-13 17:40:52 UTC
If needed I can provide CNv environment details, where problem reproduces.

Comment 14 Ilanit Stein 2020-07-16 13:42:50 UTC
Tested 10 VM imports from VMware to CNV BM, and could not reproduce this bug.

@Brett,
Also managed to import from the Vmware you provided same VM ("fdupont-test-migration (rhel7 with 2 disks) that I could not import on the PSI environment before, from some reason.

Comment 16 Brett Thurber 2020-07-16 18:21:30 UTC
Closing this BZ per QE as it wasn't reproducible on a stable env.