Bug 2151064
Summary: | Setting multiple nodes for preferred guest specified numa tuning mode doesn't show any error | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | liang cong <lcong> | |
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
libvirt sub component: | General | QA Contact: | liang cong <lcong> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | unspecified | |||
Priority: | unspecified | CC: | dzheng, jdenemar, lmen, mprivozn, virt-maint | |
Version: | 9.2 | Keywords: | Triaged | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-9.1.0-1.el9 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2166650 (view as bug list) | Environment: | ||
Last Closed: | 2023-11-07 08:30:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | 9.0.0 | |
Embargoed: | ||||
Bug Depends On: | 2166650 | |||
Bug Blocks: |
Description
liang cong
2022-12-06 03:26:08 UTC
Alright, so the problem is that for guest NUMA nodes, a memory-backend-* object is used, e.g. like this: -object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages/libvirt/qemu/2-fedora","size":2147483648,"host-nodes":[0,1],"policy":"preferred"}' whereas for overall <memory/> virNumaSetupMemoryPolicy() is called just before exec()-ing the qemu. Now, the error you see comes from that function. And that's because the function uses numa_set_preferred() API which supports only one node. This is because under the hood libnuma calls __NR_set_mempolicy with MPOL_PREFERRED which does support only one node. However, in kernel commit of v5.15-rc1~107^2~21 new MPOL_PREFERRED_MANY mode was introduced which allows specifying multiple nodes. This was then implemented in libnuma commit of v2.0.15~24 as numa_set_preferred_many(). The qemu's cmd line, however, is also a bit suspicious. I mean, QEMU accepts multiple "host-nodes" but under the hood it calls mbind() (see host_memory_backend_memory_complete() from backends/hostmem.c), which is documented as: MPOL_PREFERRED This mode sets the preferred node for allocation. The kernel will try to allocate pages from this node first and fall back to other nodes if the preferred nodes is low on free memory. If nodemask specifies more than one node ID, the first node in the mask will be selected as the preferred node. So in the end, QEMU will also configure guest memory to prefer just the first node (node #0 in our example). Nevertheless, what we can do here is to teach libvirt to use the new libnuma API if possible. I'm not decided on what do to when either kernel or libnuma is not new enough though. I mean, I worry that we might break an existing configs (although, one can argue that those never worked really). Either way, if we don't error our, this bug will (silently) fix itself as users upgrade to newer kernel and libnuma. Patch posted on the list: https://listman.redhat.com/archives/libvir-list/2022-December/236225.html QEMU patch posted here: https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg01354.html Libvirt patch merged here: 53369ad062 virnuma: Allow multiple nodes for preferred policy v8.10.0-124-g53369ad062 Hi michal, IMO for the fix, now the preferred mode should support multiple nodes, right? And I did preverification on on upstream build: v8.10.0-130-gb271d6f3b0 kernel:6.1.0-65.fc38.x86_64 numactl: # rpm -qa numa* numactl-libs-2.0.16-1.fc38.x86_64 numactl-devel-2.0.16-1.fc38.x86_64 numad-0.5-37.20150602git.fc37.x86_64 Verify steps: 1. Prepare a guest with below numa node tuning xml: # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memory mode="preferred" nodeset="0-1"/> </numatune> 2. Start the guest vm # virsh start vm1 Domain 'vm1' started 3. Check the numa tuning config # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memory mode="preferred" nodeset="0-1"/> </numatune> 4. Get numa tuning seting with virsh numatune cmd # virsh numatune vm1 numa_mode : preferred numa_nodeset : 0-1 5. Virsh edit vm config to change numa node tuning xml as below: <numatune> <memnode cellid="0" mode="preferred" nodeset="0-1"/> </numatune> 6. Shut off guest vm # virsh destroy vm1 Domain 'vm1' destroyed 7. Check the numa tuning config # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memnode cellid="0" mode="preferred" nodeset="0-1"/> </numatune> 8. Start guest vm # virsh start vm1 Domain 'vm1' started 9. Check the numa tuning config # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memnode cellid="0" mode="preferred" nodeset="0-1"/> </numatune> Also check below scenarios: 1. preferred mode with only one node 2. basic check for strict, interleave, restrictive mode 3. without numa topoplogy setting 4. with unavailable host numa node 5. with hugepage memory setting (In reply to liang cong from comment #5) > Hi michal, > IMO for the fix, now the preferred mode should support multiple nodes, right? Correct. The old behaviour stemmed from the limitations in kernel. But now, that kernel allows multiple preferred nodes there's no need for us to keep this limitation. We can just use newer APIs to talk to kernel. > > And I did preverification on on upstream build: v8.10.0-130-gb271d6f3b0 Perfect, thank you! Test on build: libvirt-9.0.0-2.el9.x86_64 other related dependencies version are listed below: # rpm -qa kernel numa* numactl-libs-2.0.14-9.el9.x86_64 kernel-5.14.0-244.el9.x86_64 numad-0.5-36.20150602git.el9.x86_64 Test steps: 1. Prepare a guest with below numa node tuning xml: # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memory mode="preferred" nodeset="0-1"/> </numatune> 2. Start the guest vm # virsh start vm1 error: Failed to start domain 'vm1' error: internal error: Process exited prior to exec: libvirt: error : internal error: NUMA memory tuning in 'preferred' mode only supports single node 3. Get numa tuning seting with virsh numatune cmd # virsh numatune vm1 numa_mode : preferred numa_nodeset : 0-1 4. Virsh edit vm config to change numa node tuning xml as below: <numatune> <memnode cellid="0" mode="preferred" nodeset="0-1"/> </numatune> 5. Start guest vm # virsh start vm1 Domain 'vm1' started Hi michal, For my testing result, for preferred mode config has different behaviors on <memory mode="preferred" nodeset="0-1"/> and <memnode cellid="0" mode="preferred" nodeset="0-1"/>, I think that should be fixed, could you help to identify? Thanks (In reply to liang cong from comment #10) > Test on build: libvirt-9.0.0-2.el9.x86_64 > other related dependencies version are listed below: > # rpm -qa kernel numa* > numactl-libs-2.0.14-9.el9.x86_64 This is the problem. we need numactl-libs-2.0.15 which added support for multiple preferred nodes. I don't think there's a rebase planned for numactl. Should we create a rebase bug and move this to next RHEL? update DTM and ITM according to dependent bug#2166650 Verified on: # rpm -q libvirt qemu-kvm libvirt-9.3.0-2.el9.x86_64 qemu-kvm-8.0.0-3.el9.x86_64 other dependencies: # rpm -qa kernel numa* numactl-libs-2.0.16-1.el9.x86_64 kernel-5.14.0-311.el9.x86_64 numad-0.5-36.20150602git.el9.x86_64 Verify steps: 1. Prepare a guest with below numa node tuning xml: # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memory mode="preferred" nodeset="0-1"/> </numatune> 2. Start the guest vm # virsh start vm1 Domain 'vm1' started 3. Check the numa tuning config # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memory mode="preferred" nodeset="0-1"/> </numatune> 4. Get numa tuning seting with virsh numatune cmd # virsh numatune vm1 numa_mode : preferred numa_nodeset : 0-1 5. Virsh edit vm config to change numa node tuning xml as below: <numatune> <memnode cellid="0" mode="preferred" nodeset="0-1"/> </numatune> 6. Shut off guest vm # virsh destroy vm1 Domain 'vm1' destroyed 7. Check the numa tuning config # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memnode cellid="0" mode="preferred" nodeset="0-1"/> </numatune> 8. Start guest vm # virsh start vm1 Domain 'vm1' started 9. Check the numa tuning config # virsh dumpxml vm1 --xpath '//numatune' <numatune> <memnode cellid="0" mode="preferred" nodeset="0-1"/> </numatune> Also check below scenarios: 1. preferred mode with only one node 2. basic check for strict, interleave, restrictive mode 3. without numa topoplogy setting 4. with unavailable host numa node 5. with hugepage memory setting Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6409 |