Bug 2168910
| Summary: | Incorrect CPU affinity with numatune mode strict | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Adrian Tomasov <atomasov> |
| Component: | libvirt | Assignee: | Virtualization Maintenance <virt-maint> |
| libvirt sub component: | General | QA Contact: | liang cong <lcong> |
| Status: | CLOSED MIGRATED | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | aokuliar, atomasov, eskultet, jhladky, lmen, mprivozn, osabart, virt-maint |
| Version: | 9.2 | Keywords: | MigratedToJIRA, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-07 21:32:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Yeah, I remember talking with Erik Skultety about this yesterday. And I've found what might look like evince in the code that this used to work, because I couldn't recall whether we used to set affinity on vCPU just based on the <numatune>. Meanwhile, explicit vCPU pinning should work, if you're in a need of a workaround. (In reply to Michal Privoznik from comment #4) > Yeah, I remember talking with Erik Skultety about this yesterday. And I've > found what might look like evince in the code that this used to work, > because I couldn't recall whether we used to set affinity on vCPU just based > on the <numatune>. Meanwhile, explicit vCPU pinning should work, if you're > in a need of a workaround. Hi michal, From the numa node tuning part of libvirt doc:https://libvirt.org/formatdomain.html#numa-node-tuning, we could only see the memory allocation is decided, but no any info about cpu affinity. AFAIK, if set auto placement mode then the numad service would decide the nodeset advisory like what we could see from cmd: # cat /run/libvirt/qemu/vm1.xml | grep nodeset <numad nodeset='X' cpuset='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'/> That would be same with cpu affinity. Even we set numa node strict to node 1, the nodeset advisory may be different with that. (In reply to Adrian Tomasov from comment #0) > Description of problem: > We have been trying to set the NUMA affinity of specific VM using libvirt by > adding this into the VM config: > <numatune> > <memory mode='strict' nodeset='1'/> > </numatune> > This should set the correct memory and vcpu affinity to a specific NUMA > node. The memory affinity was right: > PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 > Total > --------------- ------ ------ ------ ------ ------ ------ ------ ------ > ----- > 7103 (qemu-kvm) 0 2126 2 2 2 2 1 2 > 2198 > > However, the emulator and vcpus have set affinity to a different NUMA node: > # taskset -cp 7103 > pid 7103's current affinity list: 12-15,44-47 > > > Version-Release number of selected component (if applicable): > libvirt-8.5.0-7.el9_1.x86_64 > > How reproducible: > always > > Steps to Reproduce: > 1. Install hypervisor with RHEL9.1.0 and virtualization packages > 2. Install VM > 3. Add this into the VM config: > <numatune> > <memory mode='strict' nodeset='1'/> > </numatune> > <vcpu placement='auto'>8</vcpu> > > 4. Start VM > 5. Check memory affinity using numastat -c qemu-kvm > 6. Check CPU affinity using taskset -cp <PID> > > Actual results: > Correct memory, but incorrect CPU affinity. > > > Expected results: > Correct memory and CPU affinity list. > > Additional info: > We can provide you with a hypervisor with many NUMA nodes for further > investigation of this issue. Hi Adrian, Could you help to check if the following cmd get the cpu affinity of your test "12-15,44-47"? thx # cat /run/libvirt/qemu/${domain-name}.xml | grep nodeset (In reply to liang cong from comment #5) > (In reply to Michal Privoznik from comment #4) > > Yeah, I remember talking with Erik Skultety about this yesterday. And I've > > found what might look like evince in the code that this used to work, > > because I couldn't recall whether we used to set affinity on vCPU just based > > on the <numatune>. Meanwhile, explicit vCPU pinning should work, if you're > > in a need of a workaround. > > Hi michal, > From the numa node tuning part of libvirt > doc:https://libvirt.org/formatdomain.html#numa-node-tuning, we could only > see the memory allocation is decided, but no any info about cpu affinity. > AFAIK, if set auto placement mode then the numad service would decide the > nodeset advisory like what we could see from cmd: > # cat /run/libvirt/qemu/vm1.xml | grep nodeset > <numad nodeset='X' > cpuset='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'/> > > That would be same with cpu affinity. > > Even we set numa node strict to node 1, the nodeset advisory may be > different with that. Yeah, we do not document this behavior in public docs. But I think @eskultet found this somewhere in RHEL docs. Nevertheless, the code behaves crazy. Firstly, it sets affinity, but then undoes it. This is the root cause of the problem. And I do agree that the ordering of affinity should be as follows: 1) domain XML, 2) numad recommendation, 3) <numatune/> I mean, it only makes sense to set affinity of vCPU threads so that they are local to the memory they work with. Ideally, kernel would move threads around to achieve locality, but apparently that isn't happening and scheduler needs a bit of help. Mind you, affinity is a "recommendation", not hard restriction. The kernel can still schedule thread to run on a different host CPU (should it need to), but the ones from affinity set are preferred. The thing is, as I noticed recently, the docs was for RHEL-7 (https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt#sect-Virtualization_Tuning_Optimization_Guide-NUMA-NUMA_and_libvirt-Domain_Processes) and I could not find anything similar in RHEL-8/9 docs (was that just a mistake in the docs back then?). Still, even though affinity is a recommendation, it was clear from the testing in terms of code that handles this in libvirt, we could do better and then it's up to the kernel. IMO, the setting of the description could not describe the problem:
<numatune>
<memory mode='strict' nodeset='1'/>
</numatune>
<vcpu placement='auto'>8</vcpu>
For this setting, <vcpu placement='auto'>8</vcpu> means the domain process will be pinned to the advisory nodeset from querying numad. The advisory nodeset could be different with the memory tuning specified nodeset 1.
So I think the setting should be:
<numatune>
<memory mode='strict' nodeset='1'/>
</numatune>
<vcpu placement='static'>8</vcpu>
But if we set like this as the libvirt doc say:if placement is "static", but no cpuset is specified, the domain process will be pinned to all the available physical CPUs, so as the result I tried on build libvirt-9.0.0-3.el9.x86_64.
So the current behavior is same as doc logic.
|
Description of problem: We have been trying to set the NUMA affinity of specific VM using libvirt by adding this into the VM config: <numatune> <memory mode='strict' nodeset='1'/> </numatune> This should set the correct memory and vcpu affinity to a specific NUMA node. The memory affinity was right: PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total --------------- ------ ------ ------ ------ ------ ------ ------ ------ ----- 7103 (qemu-kvm) 0 2126 2 2 2 2 1 2 2198 However, the emulator and vcpus have set affinity to a different NUMA node: # taskset -cp 7103 pid 7103's current affinity list: 12-15,44-47 Version-Release number of selected component (if applicable): libvirt-8.5.0-7.el9_1.x86_64 How reproducible: always Steps to Reproduce: 1. Install hypervisor with RHEL9.1.0 and virtualization packages 2. Install VM 3. Add this into the VM config: <numatune> <memory mode='strict' nodeset='1'/> </numatune> <vcpu placement='auto'>8</vcpu> 4. Start VM 5. Check memory affinity using numastat -c qemu-kvm 6. Check CPU affinity using taskset -cp <PID> Actual results: Correct memory, but incorrect CPU affinity. Expected results: Correct memory and CPU affinity list. Additional info: We can provide you with a hypervisor with many NUMA nodes for further investigation of this issue.