Bug 2185184
Summary: | Specifying restrictive numa tuning mode per each guest numa node doesn't work | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | liang cong <lcong> |
Component: | libvirt | Assignee: | Martin Kletzander <mkletzan> |
libvirt sub component: | General | QA Contact: | liang cong <lcong> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | jdenemar, lmen, mkletzan, mprivozn, virt-maint |
Version: | 9.2 | Keywords: | AutomationTriaged, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-9.3.0-1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-07 08:31:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 9.3.0 |
Embargoed: |
Description
liang cong
2023-04-07 11:13:49 UTC
Yeah, I don't think we can use mode="restrictive" for individual guest NUMA nodes (/numatune/memnode). It's not like a NUMA node is a different thread/process (i.e. units that CGroups understand). Martin, what do you think? With `restrictive` the only setting we can do (and this mode was introduced precisely for this reason) is to limit the vCPU threads (not the emulator thread) with cpuset.mems and hope for the best (either that the allocation will be done after the setting or that it might migrate). It depends on the system and is done so that we can change the numa node(s) during runtime (which is also not guaranteed to migrate the memory). Instead of looking into `emulator/cpuset.mems` peek into `vcpu*/cpuset.mems`. Also in the first example the node has only 244MB of allocated memory, if you allocate more when it is running it should, potentially, if there's enough room, allocate it from the right node *if* you are also making sure you are allocating that memory from the right guest numa node. One more note, with cgroups v1 we explicitly set `cpuset.memory_migrate` to `1`, but cgroups v2 behave differently. When task is migrated to a cgroup the resources (including memory allocations) are not migrated with it, but once anyone writes to `cpuset.mems` the memory is migrated. I will check if we write to that file after the vcpu is moved there or before. Anyway it is all based on the fact that what is using the node memory is the vcpu of that node and we can't do much more. I posted some fix for this: https://www.mail-archive.com/libvir-list@redhat.com/msg237420.html Fixed upstream with v9.2.0-271-g383caddea103 and v9.2.0-272-g2f4f381871d2: commit 383caddea103eaab7bb495ec446b43748677f749 Author: Martin Kletzander <mkletzan> Date: Fri Apr 14 12:08:59 2023 +0200 qemu, ch: Move threads to cgroup dir before changing parameters commit 2f4f381871d253e3ec34f32b452c32570459bdde Author: Martin Kletzander <mkletzan> Date: Thu Apr 20 08:51:14 2023 +0200 docs: Clarify restrictive numatune mode Preverified on upstream libvirt v9.2.0-277-gd063389f10 Test steps: Senario1: with restrictive mode 1.1 Define and start a guest with numatune and numa config xml: <numatune> <memory mode='restrictive' nodeset='1' /> <memnode cellid="0" mode="restrictive" nodeset="0"/> </numatune> <cpu> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB'/> </numa> ... </cpu> 1.2 check the cgroup cpuset.mems config # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d2\\x2dvm1.scope/libvirt/emulator/cpuset.mems 1 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d2\\x2dvm1.scope/libvirt/vcpu0/cpuset.mems 0 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d2\\x2dvm1.scope/libvirt/vcpu1/cpuset.mems 1 1.3 in guest consume the memory # swapoff -a # memhog 1200000KB Senario2: with interleave mode 2.1 Define and start a guest with numatune and numa config xml: <numatune> <memory mode='interleave' nodeset='1' /> <memnode cellid="0" mode="interleave" nodeset="0"/> </numatune> <cpu> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB'/> </numa> ... </cpu> 2.2 check the cgroup cpuset.mems config # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d4\\x2dvm1.scope/libvirt/emulator/cpuset.mems # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d4\\x2dvm1.scope/libvirt/vcpu0/cpuset.mems # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d4\\x2dvm1.scope/libvirt/vcpu1/cpuset.mems 2.3 in guest consume the memory # swapoff -a # memhog 1200000KB Senario3: with strict mode 3.1 Define and start a guest with numatune and numa config xml: <numatune> <memory mode='strict' nodeset='0-1' /> <memnode cellid="0" mode="strict" nodeset="0"/> </numatune> <cpu> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB'/> </numa> ... </cpu> 3.2 check the cgroup cpuset.mems config # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d6\\x2dvm1.scope/libvirt/emulator/cpuset.mems 0-1 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d6\\x2dvm1.scope/libvirt/vcpu0/cpuset.mems 0-1 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d6\\x2dvm1.scope/libvirt/vcpu1/cpuset.mems 0-1 3.3 in guest consume the memory # swapoff -a # memhog 1200000KB Also check other scenarios, such as: with vcpupin, with emulatorpin, change numatuning on restrictive mode. Hi Martin, I tested the code change as above test steps, do you think other more test scenarios need to be covered? And for doc update: Note that for ``memnode`` this will only guide the memory access for the vCPU threads or similar mechanism and is very hypervisor-specific. This does not guarantee the placement of the node's memory allocation. For proper restriction other means should be used (e.g. different mode, preallocated hugepages). IMO these explanation is only for memnode with restrictive mode, right? if so, I think we'd better add that in doc, thx I don't think restrictive mode for memnodes needs much testing, of course the matrix can explode very easily. This explanation is meant for memnode, but there's added docs for numatune/memory as well, although the whole domain is restricted before launch in the latter case and that should work a bit better. mark it tested as comment 9 Verified on: # rpm -q libvirt qemu-kvm libvirt-9.3.0-2.el9.x86_64 qemu-kvm-8.0.0-3.el9.x86_64 Test steps: Senario1: restrictive mode 1.1 Define and start a guest with numatune and numa config xml: <numatune> <memory mode='restrictive' nodeset='1' /> <memnode cellid="0" mode="restrictive" nodeset="0"/> </numatune> <cpu> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB'/> </numa> ... </cpu> 1.2 check the cgroup cpuset.mems config # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d7\\x2dvm1.scope/libvirt/emulator/cpuset.mems 1 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d7\\x2dvm1.scope/libvirt/vcpu0/cpuset.mems 0 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d7\\x2dvm1.scope/libvirt/vcpu1/cpuset.mems 1 1.3 in guest consume the memory # swapoff -a # memhog 1200000KB Senario2: restrictive with interleave mode 2.1 Define and start a guest with numatune and numa config xml: <numatune> <memory mode='interleave' nodeset='1' /> <memnode cellid="0" mode="restrictive" nodeset="0"/> </numatune> <cpu> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB'/> </numa> ... </cpu> 2.2 check the cgroup cpuset.mems config # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d8\\x2dvm1.scope/libvirt/emulator/cpuset.mems # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d8\\x2dvm1.scope/libvirt/vcpu0/cpuset.mems 0 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d8\\x2dvm1.scope/libvirt/vcpu1/cpuset.mems 2.3 in guest consume the memory # swapoff -a # memhog 1200000KB Senario3: strict mode 3.1 Define and start a guest with numatune and numa config xml: <numatune> <memory mode='strict' nodeset='0-1' /> <memnode cellid="0" mode="strict" nodeset="0"/> </numatune> <cpu> <numa> <cell id='0' cpus='0' memory='1024000' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB'/> </numa> ... </cpu> 3.2 check the cgroup cpuset.mems config # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d9\\x2dvm1.scope/libvirt/emulator/cpuset.mems 0-1 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d9\\x2dvm1.scope/libvirt/vcpu0/cpuset.mems 0-1 # cat /sys/fs/cgroup/machine.slice/machine-qemu\\x2d9\\x2dvm1.scope/libvirt/vcpu1/cpuset.mems 0-1 3.3 in guest consume the memory # swapoff -a # memhog 1200000KB Also check other scenarios, such as: with vcpupin, with emulatorpin, change numatuning on restrictive mode. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6409 |