1856105 – [v2v][VMware to CNV VM import] Any VM import from VMware fails on [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer.

Bug 1856105 - [v2v][VMware to CNV VM import] Any VM import from VMware fails on [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer.

Summary: [v2v][VMware to CNV VM import] Any VM import from VMware fails on [NFC ERROR]...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	V2V
Sub Component:
Version:	2.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Brett Thurber
QA Contact:	Ilanit Stein
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-12 18:57 UTC by Ilanit Stein
Modified:	2020-07-16 18:21 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-16 18:21:30 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kubevirt-v2v-conversion-v2v-rhel7-vm.log (4.45 MB, text/plain) 2020-07-12 19:00 UTC, Ilanit Stein	no flags	Details
View All

Description Ilanit Stein 2020-07-12 18:57:35 UTC

Description of problem:
VM import from VMware of any VM (tested: RHEL7, Windows2016) fail after running for ~20 minutes, on remote host certificate.

This happened on 2 different CNV-2.4 environments, and for 2 different VMware providers.
 
v2v-log error messages:
error: [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: The remote host certificate has these problems:\n'

error: * self signed certificate in certificate chain\n'
2020-07-12 18:07:39,902 - root - DEBUG - b"nbdkit: vddk[3]: debug: NBD_ClientOpen: Couldn't connect to <Vmware Esxi>:902 The remote host certificate has these problems:\n"

debug: VixDiskLib: Detected DiskLib error 2338 (NBD_ERR_NETWORK_CONNECT).\n'
2020-07-12 18:07:39,904 - root - DEBUG - b'nbdkit: vddk[3]: debug: VixDiskLib: VixDiskLibQueryBlockList: Fail to start query process. Error 14009 (The server refused connection) (DiskLib error 2338: NBD_ERR_NETWORK_CONNECT) at 543.\n'

Version-Release number of selected component (if applicable):
vddk-7.0 or vddk-6.5
CNV-2.4 from July 12 2020
OpenShift Version 4.5.0
Kubernetes Version v1.18.3+3415b61
kubevirt-v2v-conversion versions: 
  virt-v2v-1.40.2-22.module+el8.2.0+6029+618ef2ec.x86_64
  libguestfs-1.40.2-22.module+el8.2.0+6029+618ef2ec.x86_64
  libvirt-client-6.0.0-17.2.module+el8.2.0+6629+3fc0f2c2.x86_64

Comment 1 Ilanit Stein 2020-07-12 19:00:19 UTC

Created attachment 1700744 [details]
kubevirt-v2v-conversion-v2v-rhel7-vm.log

Comment 3 Richard W.M. Jones 2020-07-13 09:44:57 UTC

It's quite hard to follow the virt-v2v output but I think the error is (from VDDK):

2020-07-12 18:30:21,529 - root - DEBUG - b'nbdkit: vddk[3]: debug: Cnx_Connect: Error message: Host address lookup for server f02-h26-000-r620.rdu2.scalelab.redhat.com failed: Name or service not known\n'
2020-07-12 18:30:21,529 - root - DEBUG - b'nbdkit: vddk[3]: error: [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: Host address lookup for server f02-h26-000-r620.rdu2.scalelab.redhat.com failed: Name or service not known\n'

which would indeed indicate some kind of network problem.  I think
that is the source ESXi server?

Comment 4 Nelly Credi 2020-07-13 09:52:43 UTC

since this is a regression, and we do not allow regressions between releases, im setting the target to 2.4
and will suggest blocker

Comment 5 Ilanit Stein 2020-07-13 11:59:25 UTC

On same CNV environment, the VM imports that were failing yesterday, are now in status "off" - That is there was an automatic retry, and they were migrated eventually successfully. 

Also, I just ran VM import of some other RHEL7 VM successfully.

I will try again to run VM import for the same rhel7/win2016 VMs, that were failing before,
and update here.

Comment 6 Ilanit Stein 2020-07-13 14:30:22 UTC

Tried again same VMs, and now it's failing for me on the same reported error.

Might be that there is a network instability issue here.

In CNV UI the status shown is:
The virtual machine could not be imported.
Terminated with Error (exit code 2).
and link to pod logs is provided.

opening the pod logs show the failure reported in this bug.

Comment 8 Ilanit Stein 2020-07-13 17:40:52 UTC

If needed I can provide CNv environment details, where problem reproduces.

Comment 14 Ilanit Stein 2020-07-16 13:42:50 UTC

Tested 10 VM imports from VMware to CNV BM, and could not reproduce this bug.

@Brett,
Also managed to import from the Vmware you provided same VM ("fdupont-test-migration (rhel7 with 2 disks) that I could not import on the PSI environment before, from some reason.

Comment 16 Brett Thurber 2020-07-16 18:21:30 UTC

Closing this BZ per QE as it wasn't reproducible on a stable env.

Note You need to log in before you can comment on or make changes to this bug.