Created attachment 1791789 [details] vm32 failed to migrate in first try - have OOM Errors Description of problem: While trying to migrate VMs (5/10 VMs), the importer pod restart due to OutOfMemory. Tested flows: Migrate 2 VMs using 2 ESX (1 VM per ESX) - Passed Migrate 4 VMs using 2 ESX (2 VMs per ESX) - 2 VMs were passed. 1 VM from each ESX Migrate 2 VMs using 1 ESX (Using 2 migration plans) - 1 VM passed without importer restart. 2nd VM passed after importer pod restart (Attached importer log vm32_1.log & vm32_2.log) Version-Release number of selected component (if applicable): CNV_2.6.3-16 MTV STAGE How reproducible: Steps to Reproduce: 1. Create a migration plan with 5/10 VMs using 1 ESX over VMware65 provider 2. Run the plan 3. Monitor importer pod status Actual results: Importer pods were restarted, failed to migrate VMs. Expected results: Migrate the VMs successfully. Additional info: Migrate 2 VMs from 1 ESX host (using 2 plans) Attached importer logs: (Migrate same VM) vm32_1.log - migrate VM failed vm32_2.log - migrate VM passed NFC ESX Configuration: <nfcsvc> <path>libnfcsvc.so</path> <enabled>true</enabled> <maxMemory>1000000000</maxMemory> <maxStreamMemory>10485760</maxStreamMemory> </nfcsvc> vpxa log: 2021-06-16T16:29:12.970Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] NfcCheckAndReserveMem: Cannot allocate any more memory as NFC is already using 48234835 and allocating 24117272 will make it more than the maximum allocated: 50331648. Please close some sessions and try again 2021-06-16T16:29:12.970Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] NfcFssrvrReadEx: Failed to allocate the temporary buffer of size 24117272 2021-06-16T16:29:12.970Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] Sending Nfc error 5: Failed to allocate the requested 24117272 bytes 2021-06-16T16:29:12.981Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] NfcNetTcpRead: EOF should not be reached 2021-06-16T16:29:12.981Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] NfcNet_Recv: requested 264, recevied only 0 bytes 2021-06-16T16:29:12.981Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] NfcGetMessage: recv failed: 2021-06-16T16:29:12.981Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] NfcServerProcessClientMsg: Failed to get message from client: NFC_NETWORK_ERROR (The operation experienced a network error) 2021-06-16T16:29:12.981Z warning vpxa[B3FFB70] [Originator@6876 sub=Libs opID=lro-556-67c68bef] [NFC ERROR] NfcServerLoop: NfcServer_HandleRead returned an error: NFC_NETWORK_ERROR (The operation experienced a network error) 2021-06-16T16:29:12.984Z error vpxa[B3FFB70] [Originator@6876 sub=vpxNfcServer opID=lro-556-67c68bef] Nfc server failed with return value : NFC_NETWORK_ERROR 2021-06-16T16:29:12.984Z verbose vpxa[B3FFB70] [Originator@6876 sub=vpxNfcServer opID=lro-556-67c68bef] Closing NFC session 2021-06-16T16:29:12.984Z info vpxa[B3FFB70] [Originator@6876 sub=vpxLro opID=lro-556-67c68bef] [VpxLRO] -- FINISH lro-556 2021-06-16T16:29:12.984Z info vpxa[B3FFB70] [Originator@6876 sub=Default opID=lro-556-67c68bef] [VpxLRO] -- ERROR lro-556 -- -- VpxNfcServerLro: vmodl.fault.SystemError: --> Result: --> (vmodl.fault.SystemError) { --> faultCause = (vmodl.MethodFault) null, --> faultMessage = <unset>, --> reason = "Nfc server failed with return value : NFC_NETWORK_ERROR" --> msg = "" --> } --> Args: --> 2021-06-16T16:29:13.170Z warning vpxa[94ACB70] [Originator@6876 sub=Libs opID=lro-559-53940489] [NFC ERROR] NfcCheckAndReserveMem: Cannot allocate any more memory as NFC is already using 48234738 and allocating 24117272 will make it more than the maximum allocated: 50331648. Please close some sessions and try again 2021-06-16T16:29:13.170Z warning vpxa[94ACB70] [Originator@6876 sub=Libs opID=lro-559-53940489] [NFC ERROR] NfcFssrvrReadEx: Failed to allocate the temporary buffer of size 24117272 2021-06-16T16:29:13.170Z warning vpxa[94ACB70] [Originator@6876 sub=Libs opID=lro-559-53940489] [NFC ERROR] Sending Nfc error 5: Failed to allocate the requested 24117272 bytes
Created attachment 1791791 [details] vm32 passed to migrate in a second try
could you please provide the updates for the bug ?
We're too late for 2.6.6 or 4.8.0. Targeting 4.8.1.
If I read the VMware error message, it says that it cannot allocate more than 50331648, which is the default value for the NFC service. It looks like the maxMemory value is not applied. I have found this VMware discussion where the person has a similar issue: https://communities.vmware.com/t5/VMware-vSphere-Discussions/NFC-session-memory-limit/td-p/2757411. The person has also set the value to 1000000000 and the ESXi host seems to not use it, possibly because it would be beyond an undocumented limit. The recommendation in the discussion is to increase the maxMemory progressively and verify how many concurrent sessions can be opened. Could you try to set the value to 62914560 and check whether the log reflects it when it fails?
Running plan with 10VMs , 1 ESX (F01-H06) 1. Set the max memory to 60MB <!-- The nfc service --> <nfcsvc> <path>libnfcsvc.so</path> <enabled>true</enabled> <maxMemory>62914560</maxMemory> <maxStreamMemory>10485760</maxStreamMemory> </nfcsvc> 2. Restart hosted & vpxa services on ESX host Problem reproduced. (Using vsphere Endpoint) Still used the default value - 50331648 (48MB) vpxa.log -------- 2021-07-08T09:04:41.454Z verbose vpxa[10C8EB70] [Originator@6876 sub=vpxNfcServer opID=lro-416-6466af9e] Starting NFC server loop 2021-07-08T09:04:41.561Z verbose vpxa[10C8EB70] [Originator@6876 sub=vpxNfcServer opID=lro-416-6466af9e] Received request to access ds:///vmfs/volumes/602bc26a-b99c5ac8-88e0-bc97e17cf0f0/dc65-iscsi-scalevm-50gb-70usage-40/dc65-iscsi-scalevm-50gb-70usage-40.vmdk 2021-07-08T09:04:42.081Z warning vpxa[BEB2B70] [Originator@6876 sub=Libs opID=lro-412-61bbfb41] [NFC ERROR] NfcCheckAndReserveMem: Cannot allocate any more memory as NFC is already using 48234932 and allocating 24117272 will make it more than the maximum allocated: 50331648. Please close some sessions and try again 2021-07-08T09:04:42.081Z warning vpxa[BEB2B70] [Originator@6876 sub=Libs opID=lro-412-61bbfb41] [NFC ERROR] NfcFssrvrReadEx: Failed to allocate the temporary buffer of size 24117272 2021-07-08T09:04:42.081Z warning vpxa[BEB2B70] [Originator@6876 sub=Libs opID=lro-412-61bbfb41] [NFC ERROR] Sending Nfc error 5: Failed to allocate the requested 24117272 bytes 2021-07-08T09:04:42.084Z verbose vpxa[BEB2B70] [Originator@6876 sub=vpxNfcServer opID=lro-412-61bbfb41] Closing NFC session 2021-07-08T09:04:42.084Z info vpxa[BEB2B70] [Originator@6876 sub=vpxLro opID=lro-412-61bbfb41] [VpxLRO] -- FINISH lro-412 2021-07-08T09:04:43.197Z verbose vpxa[BFBAB70] [Originator@6876 sub=vpxNfcServer opID=lro-407-474dd277] Closing NFC session 2021-07-08T09:04:43.197Z info vpxa[BFBAB70] [Originator@6876 sub=vpxLro opID=lro-407-474dd277] [VpxLRO] -- FINISH lro-407 2021-07-08T09:04:43.241Z warning vpxa[10C2BB70] [Originator@6876 sub=Libs opID=lro-414-3e1554e2] [NFC ERROR] NfcCheckAndReserveMem: Cannot allocate any more memory as NFC is already using 48234835 and allocating 24117272 will make it more than the maximum allocated: 50331648. Please close some sessions and try again 2021-07-08T09:04:43.241Z warning vpxa[10C2BB70] [Originator@6876 sub=Libs opID=lro-414-3e1554e2] [NFC ERROR] NfcFssrvrReadEx: Failed to allocate the temporary buffer of size 24117272 2021-07-08T09:04:43.241Z warning vpxa[10C2BB70] [Originator@6876 sub=Libs opID=lro-414-3e1554e2] [NFC ERROR] Sending Nfc error 5: Failed to allocate the requested 24117272 bytes 2021-07-08T09:04:43.355Z info vpxa[BF99B70] [Originator@6876 sub=vpxLro opID=74741185-91] [VpxLRO] -- BEGIN lro-417 -- vpxa -- vpxapi.VpxaService.nfcMakeTicketByFiles -- 5270d3b4-8ab2-6837-c9e4-b5bc3021d95a 2021-07-08T09:04:43.356Z info vpxa[BF99B70] [Originator@6876 sub=vpxLro opID=74741185-91] [VpxLRO] -- FINISH lro-417 importer.log ------------ I0708 09:03:58.674976 1 vddk-datasource.go:807] Transferred 12353273856/53687091200 bytes (23%) at 143642719 bytes/second overall, last 1% was 530579456 bytes at 176859818 bytes/second I0708 09:04:02.607429 1 vddk-datasource.go:807] Transferred 12884901888/53687091200 bytes (24%) at 143165576 bytes/second overall, last 1% was 531628032 bytes at 177209344 bytes/second I0708 09:04:03.495210 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: [NFC ERROR]NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 24117272 bytes I0708 09:04:03.495244 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: VixDiskLib_Read: Memory allocation failed. Out of memory. I0708 09:04:08.019716 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: [NFC ERROR]NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 24117272 bytes I0708 09:04:08.019752 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: VixDiskLib_Read: Memory allocation failed. Out of memory. I0708 09:04:16.126019 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: [NFC ERROR]NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 24117272 bytes I0708 09:04:16.126071 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: VixDiskLib_Read: Memory allocation failed. Out of memory. I0708 09:04:20.659840 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: [NFC ERROR]NfcFssrvrProcessErrorMsg: received NFC error 5 from server: Failed to allocate the requested 24117272 bytes I0708 09:04:20.659859 1 nbdkit.go:217] Log line from nbdkit: nbdkit: vddk[1]: error: VixDiskLib_Read: Memory allocation failed. Out of memory. Unable to transfer source data to target file kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:205 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:165 main.main /remote-source/app/cmd/cdi-importer/importer.go:189 runtime.main /usr/lib/golang/src/runtime/proc.go:204 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1374 E0708 09:26:28.754095 1 importer.go:191] pread: nbd_pread: read: command failed: Input/output error Unable to transfer source data to target file kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:205 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:165 main.main /remote-source/app/cmd/cdi-importer/importer.go:189 runtime.main /usr/lib/golang/src/runtime/proc.go:204 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1374
Do you plan to have this merged and backported by Aug 10? If not we will need to push it out to 4.8.2.
Yes, I have a pull request almost ready for 1984775, which should also fix this bug once it is merged to the right branch. It might be a good idea to make this one a duplicate of 1984775, or the other way around.
*** This bug has been marked as a duplicate of bug 1984775 ***