As of today the support for NUMA topology is somewhat rudimentary. Roman did some of the basic implementation https://github.com/kubevirt/user-guide/pull/457/files But this has some nondeterministic behavior in the sense that kubevirt can not provide specific cores from specific NUMA nodes. This "might work" for the single big VM scenario, but is not sufficient to support multiple VMs. We need to be able to reliably acquire certain cores from specific NUMA nodes. What we need is a way to define how many NUMA nodes (and sockets), cores and threads a VM should have and the allocated resources should live 1:1 on the underlying hardware, e.g. the CPU cores from the same NUMA nodes bare metal and virtual and same for the memory attached to it. The sizes we should plan for are 1,2,4,... NUMA nodes per VM and if we can plan for sub NUMA node entities as well (e.g. "half" a CPU socket), that would be great. The current implementation is specified as follows: spec: domain: cpu: cores: 10 sockets: 4 threads: 2 dedicatedCpuPlacement: true isolateEmulatorThread: true model: host-passthrough numa: guestMappingPassthrough: {} Either this is being enhanced by the numa node count attribute, e.g.: spec: domain: cpu: cores: 10 sockets: 4 threads: 2 dedicatedCpuPlacement: true isolateEmulatorThread: true model: host-passthrough numa: guestMappingPassthrough: {} nodes: 4 or it's implicitly derived from the socket count when guestMappingPassthrough is set. I guess that's up for discussion since there might be platforms where there is no 1:1 mapping between socket and NUMA node.
The "basic implementation" is actually almost everything that KubeVirt has to do. Almost all other limitations, including the non-deterministic behavior, exist due to Kubernetes limitations. In order to stay consistent with Kubernetes, it was an intentional decision to a) reflect the pNUMA topoloy in guests (vNUMA) as this is the best the virtualization layer can do. A second, but differently prioritized goal b) was to enhance Kubernetes to gain better NUMA awareness to do more optimal pNUMA assignments to pods. If a better pNUMA assignment to pods happens, then KubeVirt and it's VMs will directly benefit. Because it's Kubernetes responsibility, your RFE must not be solved in KubeVirt.
This got moved to https://issues.redhat.com/browse/CNV-21084 Possibly this RFE can be addressed by leveraging https://docs.openshift.com/container-platform/4.11/scalability_and_performance/cnf-numa-aware-scheduling.html