When there is a performance issue with large VMs a common recommendation is to use vCPU and NUMA node pinning to achieve higher performance with better predictability.
Pinning complicates management since CPUs and NUMA nodes need to be manually selected depending on the host where the VM will run. Live migration to a new host requires manually checking CPU and NUMA node availability on the new host and then reconfiguring pinning for the VM with new CPU and NUMA node ids.
For these reasons pinning is not practical for RHV customers requiring live migration and the ability to start a VM on more than one specific host.
Description of enhancement
Auto-pinning should allow VMs to run with vCPU and NUMA pinning without manual configuration. The vCPUs should be pinned to free host CPUs with regard to host NUMA topology. The vNUMA topology should be automatically calculated based on vCPU count and guest RAM size. vNUMA nodes should be pinned to host NUMA nodes that have sufficient free memory.
Upon live migration the vCPU pinning will be updated to free host CPUs on the new host. NUMA pinning will be updated to host NUMA nodes with sufficient free memory on the new host. The vNUMA topology will remain unchanged.
When a new VM does not fit into available host resources it may be possible to re-pin existing VMs on the host to free up host CPUs and NUMA nodes for the new VM. This ensures that host resources are still used effectively after many VMs have been started and shut down on a host, which could have caused fragmentation.
When a VM is started or live migrated there may be insufficient host CPUs and free NUMA nodes to avoid overcommit. The desired behavior can be set by a new auto-pinning policy field that takes the values "preferred" or "strict" (these terms are inspired by the NUMA tuning modes with the same names). "Preferred" means that free host CPUs and NUMA nodes are preferred but the VM will be allowed to run with overcommit should this be necessary due to a resource shortage. "Strict" means that dedicated free CPUs and NUMA nodes are required and the VM will refuse to run when there are insufficient resources available. This setting could also be called "Allow overcommit? Yes/No".
Observability via API
The ability to monitor free host CPUs and NUMA node memory from external tools such as Cloudforms is required so that operations teams can observe utilization and make decisions based on it. This may require API enhancements in RHV.
Should an auto-pinned VM run on a new host where a different vNUMA topology is required, there are two possibilities:
1. The new vNUMA topology can be reflected if the VM is powered down. This will ensure best performance.
2. If the VM is running (across live migration) it can retain its existing vNUMA topology and the UI should indicate that it is running in a sub-optimal state.
The vCPU and NUMA node pinning features in RHV requires a high degree of manual configuration that render these features impractical when live migration or HA is required.
The proposed auto-pinning feature offers the same performance advantages as manual vCPU and NUMA node pinning but eliminates these management problems.
Customers running large VMs that exceed the host NUMA node size require auto-pinning so that RHV performs well in their environment.
it's quite a complicated feature, I'd say it's not really usable without defining priorities, different weights for different situations. Seems to me rather a task for something like ovirt-optimizer which would need to be resurrected...
Might make sense to start with smaller blocks, e.g. auto vNUMA creation/suggestion, auto create CPU pinning based on vNUMA, runtime on-demand CPU pinning update, etc.
auto pinning has been introduced in REST API by bug 1862968. Bug 1887356 takes it further to High Performance VMs to do that automatically
Scheduling-related changes are not yet planned.