RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2151064 - Setting multiple nodes for preferred guest specified numa tuning mode doesn't show any error
Summary: Setting multiple nodes for preferred guest specified numa tuning mode doesn't...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: liang cong
URL:
Whiteboard:
Depends On: 2166650
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-06 03:26 UTC by liang cong
Modified: 2023-11-07 09:37 UTC (History)
5 users (show)

Fixed In Version: libvirt-9.1.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2166650 (view as bug list)
Environment:
Last Closed: 2023-11-07 08:30:47 UTC
Type: Bug
Target Upstream Version: 9.0.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker LIBVIRTAT-14315 0 None None None 2023-06-01 04:06:56 UTC
Red Hat Issue Tracker RHELPLAN-141371 0 None None None 2022-12-06 03:32:42 UTC
Red Hat Product Errata RHSA-2023:6409 0 None None None 2023-11-07 08:31:29 UTC

Description liang cong 2022-12-06 03:26:08 UTC
Description of problem: Setting multiple nodes for preferred guest specified numa tuning mode doesn't show any error


Version-Release number of selected component (if applicable):
# rpm -q libvirt qemu-kvm
libvirt-8.10.0-1.el9.x86_64
qemu-kvm-7.1.0-6.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Define a guest vm with below xml:
<numatune>
  <memnode cellid='0' mode='preferred' nodeset='0-1'/>
</numatune>
...
<numa>
  <cell id='0' cpus='0-3' memory='2097152' unit='KiB' />
</numa>


2. Start the guest vm
# virsh start vm1
Domain 'vm1' started


Actual results:
The guest vm starts without any error


Expected results:
The guest vm fails to start with error imply that 'preferred' mode only supports single node

Additional info:
When seting with numa node tuning xml like below:
<numatune>
  <memory mode='preferred' nodeset='0-1'/>
</numatune>

Then there is error shown when starting the guest vm:
# virsh start vm1
error: Failed to start domain 'vm1'
error: internal error: Process exited prior to exec: libvirt:  error : internal error: NUMA memory tuning in 'preferred' mode only supports single node

Comment 1 Michal Privoznik 2022-12-09 12:09:48 UTC
Alright, so the problem is that for guest NUMA nodes, a memory-backend-* object is used, e.g. like this:

-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages/libvirt/qemu/2-fedora","size":2147483648,"host-nodes":[0,1],"policy":"preferred"}'

whereas for overall <memory/> virNumaSetupMemoryPolicy() is called just before exec()-ing the qemu. Now, the error you see comes from that function. And that's because the function uses numa_set_preferred() API which supports only one node. This is because under the hood libnuma calls __NR_set_mempolicy with MPOL_PREFERRED which does support only one node. However, in kernel commit of v5.15-rc1~107^2~21 new MPOL_PREFERRED_MANY mode was introduced which allows specifying multiple nodes. This was then implemented in libnuma commit of v2.0.15~24 as numa_set_preferred_many().

The qemu's cmd line, however, is also a bit suspicious. I mean, QEMU accepts multiple "host-nodes" but under the hood it calls mbind() (see host_memory_backend_memory_complete() from backends/hostmem.c), which is documented as:

  MPOL_PREFERRED
    This  mode  sets the preferred node for allocation.  The kernel will try to allocate pages from this node first and fall back to other nodes if the preferred nodes is low on free memory.  If nodemask specifies more than one node ID, the first node in the mask will be selected as the preferred node.

So in the end, QEMU will also configure guest memory to prefer just the first node (node #0 in our example).

Nevertheless, what we can do here is to teach libvirt to use the new libnuma API if possible. I'm not decided on what do to when either kernel or libnuma is not new enough though. I mean, I worry that we might break an existing configs (although, one can argue that those never worked really). Either way, if we don't error our, this bug will (silently) fix itself as users upgrade to newer kernel and libnuma.

Comment 2 Michal Privoznik 2022-12-09 16:09:06 UTC
Patch posted on the list:

https://listman.redhat.com/archives/libvir-list/2022-December/236225.html

Comment 3 Michal Privoznik 2022-12-09 17:08:48 UTC
QEMU patch posted here:

https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg01354.html

Comment 4 Michal Privoznik 2022-12-14 15:14:01 UTC
Libvirt patch merged here:

53369ad062 virnuma: Allow multiple nodes for preferred policy

v8.10.0-124-g53369ad062

Comment 5 liang cong 2022-12-16 10:44:48 UTC
Hi michal,
IMO for the fix, now the preferred mode should support multiple nodes, right?

And I did preverification on on upstream build: v8.10.0-130-gb271d6f3b0
kernel:6.1.0-65.fc38.x86_64
numactl:
# rpm -qa numa*
numactl-libs-2.0.16-1.fc38.x86_64
numactl-devel-2.0.16-1.fc38.x86_64
numad-0.5-37.20150602git.fc37.x86_64

Verify steps:
1. Prepare a guest with below numa node tuning xml:
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

2. Start the guest vm
# virsh start vm1
Domain 'vm1' started

3. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

4. Get numa tuning seting with virsh numatune cmd
# virsh numatune vm1
numa_mode      : preferred
numa_nodeset   : 0-1

5. Virsh edit vm config to change numa node tuning xml as below:
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

6. Shut off guest vm
# virsh destroy vm1
Domain 'vm1' destroyed

7. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

8. Start guest vm
# virsh start vm1
Domain 'vm1' started


9. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

Also check below scenarios:
1. preferred mode with only one node
2. basic check for strict, interleave, restrictive mode
3. without numa topoplogy setting
4. with unavailable host numa node
5. with hugepage memory setting

Comment 6 Michal Privoznik 2022-12-16 11:41:30 UTC
(In reply to liang cong from comment #5)
> Hi michal,
> IMO for the fix, now the preferred mode should support multiple nodes, right?

Correct. The old behaviour stemmed from the limitations in kernel. But now, that kernel allows multiple preferred nodes there's no need for us to keep this limitation. We can just use newer APIs to talk to kernel.

> 
> And I did preverification on on upstream build: v8.10.0-130-gb271d6f3b0

Perfect, thank you!

Comment 10 liang cong 2023-02-02 01:55:16 UTC
Test on build: libvirt-9.0.0-2.el9.x86_64
other related dependencies version are listed below:
# rpm -qa kernel numa*
numactl-libs-2.0.14-9.el9.x86_64
kernel-5.14.0-244.el9.x86_64
numad-0.5-36.20150602git.el9.x86_64


Test steps:
1. Prepare a guest with below numa node tuning xml:
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

2. Start the guest vm
# virsh start vm1
error: Failed to start domain 'vm1'
error: internal error: Process exited prior to exec: libvirt:  error : internal error: NUMA memory tuning in 'preferred' mode only supports single node

3. Get numa tuning seting with virsh numatune cmd
# virsh numatune vm1
numa_mode      : preferred
numa_nodeset   : 0-1

4. Virsh edit vm config to change numa node tuning xml as below:
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>


5. Start guest vm
# virsh start vm1
Domain 'vm1' started

Hi michal,
For my testing result, for preferred mode config has different behaviors on  <memory mode="preferred" nodeset="0-1"/> and <memnode cellid="0" mode="preferred" nodeset="0-1"/>, I think that should be fixed, could you help to identify? Thanks

Comment 11 Michal Privoznik 2023-02-02 11:17:12 UTC
(In reply to liang cong from comment #10)
> Test on build: libvirt-9.0.0-2.el9.x86_64
> other related dependencies version are listed below:
> # rpm -qa kernel numa*
> numactl-libs-2.0.14-9.el9.x86_64

This is the problem. we need numactl-libs-2.0.15 which added support for multiple preferred nodes. I don't think there's a rebase planned for numactl. Should we create a rebase bug and move this to next RHEL?

Comment 14 liang cong 2023-04-10 07:57:16 UTC
update DTM and ITM according to dependent bug#2166650

Comment 18 liang cong 2023-05-18 05:47:16 UTC
Verified on:
# rpm -q libvirt qemu-kvm
libvirt-9.3.0-2.el9.x86_64
qemu-kvm-8.0.0-3.el9.x86_64

other dependencies:
# rpm -qa kernel numa*
numactl-libs-2.0.16-1.el9.x86_64
kernel-5.14.0-311.el9.x86_64
numad-0.5-36.20150602git.el9.x86_64

Verify steps:
1. Prepare a guest with below numa node tuning xml:
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

2. Start the guest vm
# virsh start vm1
Domain 'vm1' started

3. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memory mode="preferred" nodeset="0-1"/>
</numatune>

4. Get numa tuning seting with virsh numatune cmd
# virsh numatune vm1
numa_mode      : preferred
numa_nodeset   : 0-1

5. Virsh edit vm config to change numa node tuning xml as below:
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

6. Shut off guest vm
# virsh destroy vm1
Domain 'vm1' destroyed

7. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

8. Start guest vm
# virsh start vm1
Domain 'vm1' started


9. Check the numa tuning config
# virsh dumpxml vm1 --xpath '//numatune'
<numatune>
  <memnode cellid="0" mode="preferred" nodeset="0-1"/>
</numatune>

Also check below scenarios:
1. preferred mode with only one node
2. basic check for strict, interleave, restrictive mode
3. without numa topoplogy setting
4. with unavailable host numa node
5. with hugepage memory setting

Comment 20 errata-xmlrpc 2023-11-07 08:30:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6409


Note You need to log in before you can comment on or make changes to this bug.