Description of problem: MTV migration plan with 2 VMs, fails with error in the the importer pod log: I0721 09:28:24.883919 1 vddk-datasource.go:200] Log line from nbdkit: nbdkit: vddk[1]: error: [NFC ERROR]NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 24117272 bytes I0721 09:28:24.883976 1 vddk-datasource.go:200] Log line from nbdkit: nbdkit: vddk[1]: error: VixDiskLib_Read: Memory allocation failed. Out of memory. I0721 09:28:31.988105 1 vddk-datasource.go:200] Log line from nbdkit: nbdkit: vddk[1]: error: [NFC ERROR]NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 24117272 bytes A migration plan with only one of the VMs did pass, but the copy was extremely slow, the above error appeared in the log, and the VM didn't start automatically, after migration. It was possible to start it manually though. Other VMs rom same VMware fail to migrate as well. Richard Jones: ============= The one above seems to indicate an error inside VDDK library when allocating memory. It's pretty clearly running out of memory inside VDDK. It happens quite quickly too, probably within the first few read requests. Of course VDDK is a black box so we don't know specifically what's going on inside it, but it wouldn't be a surprise if it needs to allocate memory during a read. If this is running inside a container, try increasing the cgroup limits on the amount of RAM the container is allowed to use. I'm not clear if this is virt-v2v or you're using nbdkit directly, but for virt-v2v there are some guidelines here: https://libguestfs.org/virt-v2v.1.html#compute-power-and-ram If using nbdkit directly, you shouldn't need nearly that much RAM, but clearly you need more than you're giving it now. Matthew Arnold: ============== Thanks Rich, it's helpful to know that it's the local VDDK side that's failing. Just to confirm, this log is indeed from CDI running nbdkit in a container. The container is created and managed by CDI itself though, so I don't know of an easy way to adjust its cgroup limits on a live system. I will try to reproduce the bug with modifications to the code that creates the container, unless anyone else knows a trick for changing these limits as soon as it starts. Richard Jones: ============= Changing the limits is the best thing to do. However there's another thing that you could try if turns out to be impossible. nbdkit doesn't normally break up large requests from the NBD client. eg. If the client makes a request to read the maximum size block (32M) then it will pass that to the plugin which will request that VDDK makes a 32M read. In other words, VixDiskLib_Read is being called here with count = 32M (actually in sectors, so divided by 512): https://gitlab.com/nbdkit/nbdkit/-/blob/e510b9c0a061966d07e3f56c975a968f277913d1/plugins/vddk/vddk.c#L726 If we theorize that this is causing VDDK to allocate 32M per request, you could adjust the client to make smaller requests. eg. nbdcopy lets you adjust max request size using the --request-size flag. Or if you can't do that, then insert the blocksize filter into the chain of filters which will break up large requests: https://libguestfs.org/nbdkit-blocksize-filter.1.html (eg: --filter=blocksize ... maxdata=1M) Similarly if the client is making multiple requests in parallel (which could allocate N * 32M) either reduce the amount of parallelism in the client or use this filter: https://libguestfs.org/nbdkit-noparallel-filter.1.html (--filter=noparallel ... serialize=all-requests) Version-Release number of selected component (if applicable): OCP-4.7/CNV-2.6.6-44/MTV-2.4 release VMware 6.5 How reproducible: Issue doesn't reproduce on the same OCP cluster using another 6.5 VMware. Additional info: This issue is NOT related to the bellow nfc Max Memory: Based on the info in https://bugzilla.redhat.com/show_bug.cgi?id=1614276#c24 We checked the Max Memory that is set on the ESXi host, and it was already set to 1000000000: <!-- The nfc service --> <nfcsvc> <path>libnfcsvc.so</path> <enabled>true</enabled> <maxMemory>1000000000</maxMemory> <maxStreamMemory>10485760</maxStreamMemory> </nfcsvc>
Changing the component to Storage, since the fix is in CDI.
*** Bug 1973193 has been marked as a duplicate of this bug. ***
Verified on CNV-4.8.1-18, By migrating from the same VMware and 2 VMs with 2 disks, for which this bug was reported. Migration to target storage NFS passed. VMs were successfully started on Openshift side,
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.1 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3259