Bug 2151064

Summary: Setting multiple nodes for preferred guest specified numa tuning mode doesn't show any error
Product: Red Hat Enterprise Linux 9 Reporter: liang cong <lcong>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
libvirt sub component: General QA Contact: liang cong <lcong>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: dzheng, jdenemar, lmen, mprivozn, virt-maint
Version: 9.2Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-9.1.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2166650 (view as bug list) Environment:
Last Closed: 2023-11-07 08:30:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 9.0.0
Embargoed:
Bug Depends On: 2166650    
Bug Blocks:    

Description liang cong 2022-12-06 03:26:08 UTC
Description of problem: Setting multiple nodes for preferred guest specified numa tuning mode doesn't show any error


Version-Release number of selected component (if applicable):
# rpm -q libvirt qemu-kvm
libvirt-8.10.0-1.el9.x86_64
qemu-kvm-7.1.0-6.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Define a guest vm with below xml:
<numatune>
  <memnode cellid='0' mode='preferred' nodeset='0-1'/>
</numatune>
...
<numa>
  <cell id='0' cpus='0-3' memory='2097152' unit='KiB' />
</numa>


2. Start the guest vm
# virsh start vm1
Domain 'vm1' started


Actual results:
The guest vm starts without any error


Expected results:
The guest vm fails to start with error imply that 'preferred' mode only supports single node

Additional info:
When seting with numa node tuning xml like below:
<numatune>
  <memory mode='preferred' nodeset='0-1'/>
</numatune>

Then there is error shown when starting the guest vm:
# virsh start vm1
error: Failed to start domain 'vm1'
error: internal error: Process exited prior to exec: libvirt:  error : internal error: NUMA memory tuning in 'preferred' mode only supports single node

Comment 1 Michal Privoznik 2022-12-09 12:09:48 UTC
Alright, so the problem is that for guest NUMA nodes, a memory-backend-* object is used, e.g. like this:

-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages/libvirt/qemu/2-fedora","size":2147483648,"host-nodes":[0,1],"policy":"preferred"}'

whereas for overall <memory/> virNumaSetupMemoryPolicy() is called just before exec()-ing the qemu. Now, the error you see comes from that function. And that's because the function uses numa_set_preferred() API which supports only one node. This is because under the hood libnuma calls __NR_set_mempolicy with MPOL_PREFERRED which does support only one node. However, in kernel commit of v5.15-rc1~107^2~21 new MPOL_PREFERRED_MANY mode was introduced which allows specifying multiple nodes. This was then implemented in libnuma commit of v2.0.15~24 as numa_set_preferred_many().

The qemu's cmd line, however, is also a bit suspicious. I mean, QEMU accepts multiple "host-nodes" but under the hood it calls mbind() (see host_memory_backend_memory_complete() from backends/hostmem.c), which is documented as:

  MPOL_PREFERRED
    This  mode  sets the preferred node for allocation.  The kernel will try to allocate pages from this node first and fall back to other nodes if the preferred nodes is low on free memory.  If nodemask specifies more than one node ID, the first node in the mask will be selected as the preferred node.

So in the end, QEMU will also configure guest memory to prefer just the first node (node #0 in our example).

Nevertheless, what we can do here is to teach libvirt to use the new libnuma API if possible. I'm not decided on what do to when either kernel or libnuma is not new enough though. I mean, I worry that we might break an existing configs (although, one can argue that those never worked really). Either way, if we don't error our, this bug will (silently) fix itself as users upgrade to newer kernel and libnuma.

Comment 2 Michal Privoznik 2022-12-09 16:09:06 UTC
Patch posted on the list:

https://listman.redhat.com/archives/libvir-list/2022-December/236225.html

Comment 3 Michal Privoznik 2022-12-09 17:08:48 UTC
QEMU patch posted here:

https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg01354.html

Comment 4 Michal Privoznik 2022-12-14 15:14:01 UTC
Libvirt patch merged here:

53369ad062 virnuma: Allow multiple nodes for preferred policy

v8.10.0-124-g53369ad062

Comment 5 liang cong 2022-12-16 10:44:48 UTC
Hi michal,
IMO for the fix, now the preferred mode should support multiple nodes, right?

And I did preverification on on upstream build: v8.10.0-130-gb271d6f3b0
kernel:6.1.0-65.fc38.x86_64
numactl:
# rpm -qa numa*
numactl-libs-2.0.16-1.fc38.x86_64
numactl-devel-2.0.16-1.fc38.x86_64
numad-0.5-37.20150602git.fc37.x86_64

Verify steps:
1. Prepare a guest with below numa node tuning xml:
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

2. Start the guest vm
# virsh start vm1
Domain 'vm1' started

3. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

4. Get numa tuning seting with virsh numatune cmd
# virsh numatune vm1
numa_mode      : preferred
numa_nodeset   : 0-1

5. Virsh edit vm config to change numa node tuning xml as below:
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

6. Shut off guest vm
# virsh destroy vm1
Domain 'vm1' destroyed

7. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

8. Start guest vm
# virsh start vm1
Domain 'vm1' started


9. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

Also check below scenarios:
1. preferred mode with only one node
2. basic check for strict, interleave, restrictive mode
3. without numa topoplogy setting
4. with unavailable host numa node
5. with hugepage memory setting

Comment 6 Michal Privoznik 2022-12-16 11:41:30 UTC
(In reply to liang cong from comment #5)
> Hi michal,
> IMO for the fix, now the preferred mode should support multiple nodes, right?

Correct. The old behaviour stemmed from the limitations in kernel. But now, that kernel allows multiple preferred nodes there's no need for us to keep this limitation. We can just use newer APIs to talk to kernel.

> 
> And I did preverification on on upstream build: v8.10.0-130-gb271d6f3b0

Perfect, thank you!

Comment 10 liang cong 2023-02-02 01:55:16 UTC
Test on build: libvirt-9.0.0-2.el9.x86_64
other related dependencies version are listed below:
# rpm -qa kernel numa*
numactl-libs-2.0.14-9.el9.x86_64
kernel-5.14.0-244.el9.x86_64
numad-0.5-36.20150602git.el9.x86_64


Test steps:
1. Prepare a guest with below numa node tuning xml:
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

2. Start the guest vm
# virsh start vm1
error: Failed to start domain 'vm1'
error: internal error: Process exited prior to exec: libvirt:  error : internal error: NUMA memory tuning in 'preferred' mode only supports single node

3. Get numa tuning seting with virsh numatune cmd
# virsh numatune vm1
numa_mode      : preferred
numa_nodeset   : 0-1

4. Virsh edit vm config to change numa node tuning xml as below:
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>


5. Start guest vm
# virsh start vm1
Domain 'vm1' started

Hi michal,
For my testing result, for preferred mode config has different behaviors on  <memory mode="preferred" nodeset="0-1"/> and <memnode cellid="0" mode="preferred" nodeset="0-1"/>, I think that should be fixed, could you help to identify? Thanks

Comment 11 Michal Privoznik 2023-02-02 11:17:12 UTC
(In reply to liang cong from comment #10)
> Test on build: libvirt-9.0.0-2.el9.x86_64
> other related dependencies version are listed below:
> # rpm -qa kernel numa*
> numactl-libs-2.0.14-9.el9.x86_64

This is the problem. we need numactl-libs-2.0.15 which added support for multiple preferred nodes. I don't think there's a rebase planned for numactl. Should we create a rebase bug and move this to next RHEL?

Comment 14 liang cong 2023-04-10 07:57:16 UTC
update DTM and ITM according to dependent bug#2166650

Comment 18 liang cong 2023-05-18 05:47:16 UTC
Verified on:
# rpm -q libvirt qemu-kvm
libvirt-9.3.0-2.el9.x86_64
qemu-kvm-8.0.0-3.el9.x86_64

other dependencies:
# rpm -qa kernel numa*
numactl-libs-2.0.16-1.el9.x86_64
kernel-5.14.0-311.el9.x86_64
numad-0.5-36.20150602git.el9.x86_64

Verify steps:
1. Prepare a guest with below numa node tuning xml:
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

2. Start the guest vm
# virsh start vm1
Domain 'vm1' started

3. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

4. Get numa tuning seting with virsh numatune cmd
# virsh numatune vm1
numa_mode      : preferred
numa_nodeset   : 0-1

5. Virsh edit vm config to change numa node tuning xml as below:
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

6. Shut off guest vm
# virsh destroy vm1
Domain 'vm1' destroyed

7. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

8. Start guest vm
# virsh start vm1
Domain 'vm1' started


9. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

Also check below scenarios:
1. preferred mode with only one node
2. basic check for strict, interleave, restrictive mode
3. without numa topoplogy setting
4. with unavailable host numa node
5. with hugepage memory setting

Comment 20 errata-xmlrpc 2023-11-07 08:30:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6409