1437559 – [RFE] Explicitly assign all CPUs to NUMA nodes

Bug 1437559 - [RFE] Explicitly assign all CPUs to NUMA nodes

Summary: [RFE] Explicitly assign all CPUs to NUMA nodes

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	future
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	ovirt-4.4.0
Target Release:	---
Assignee:	Steven Rosenberg
QA Contact:	Polina
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1794304 (view as bug list)
Depends On:	1794304
Blocks:	1792944
TreeView+	depends on / blocked

Reported:	2017-03-30 14:19 UTC by Milan Zamazal
Modified:	2020-05-20 19:59 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-05-20 19:59:29 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:
Flags:	pagranat: needinfo+ rbarry: ovirt-4.4? rule-engine: planning_ack? pm-rhel: devel_ack+ pm-rhel: testing_ack+

Attachments	(Terms of Use)
logs (1.28 MB, application/gzip) 2020-01-22 14:27 UTC, Polina	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	100543	0	'None'	MERGED	engine: Numa config CPUs needs to include all CPUs	2021-02-14 13:06:59 UTC
oVirt gerrit	106521	0	master	MERGED	engine: Numa config CPUs needs to include all CPUs	2021-02-14 13:06:59 UTC

Description Milan Zamazal 2017-03-30 14:19:35 UTC

oVirt puts all the initial VM CPUs to a single NUMA node or permits optional pinning of CPUs to NUMA nodes. However in certain NUMA arrangements or when hot plugging new CPUs some CPUs are not explicitly assigned to NUMA nodes. For instance, if no NUMA configuration is selected and a CPU is hot plugged, the hot plugged CPU may be assigned to a different node than all the other CPUs, until the VM is shutdown and started again. Similarly with explicit NUMA configurations some CPUs may end up in undesired nodes.

oVirt Engine should make sure that all CPUs, both initially available and the hot plugged ones are assigned to appropriate NUMA nodes. That means that unassigned (e.g. hot plugged) CPUs are assigned to the same NUMA node as all the other CPUs when no explicit NUMA configuration is defined, or proportionally spread among the NUMA nodes specified for the initially available CPUs otherwise. That assignment should be automatic (without bothering the user) and made at the time when a VM is started.

Comment 1 Michal Skrivanek 2018-09-14 14:45:25 UTC

this needs to get fixed eventually

Comment 2 Milan Zamazal 2019-02-18 14:33:13 UTC

It should also hopefully remedy the QEMU warning that confuses users on QEMU crashes or other problems:

qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future

Comment 3 Steven Rosenberg 2019-06-02 09:51:22 UTC

As I understand, the actual error is in the qemu via the warning:

"warning: CPU(s) not present in any NUMA nodes: CPU 1 [socket-id: 1, core-id: 0, thread-id: 0], CPU 2 ..., CPU 15 [socket-id: 15, core-id: 0, thread-id: 0]"

This is said to be because the engine's cpu numa section is sending the current cpus and not the maximum cpus as a range as follows:

<cpu match="exact">
    <model>SandyBridge</model>
    <topology cores="1" sockets="16" threads="1" />
    <numa>
        <cell cpus="0" id="0" memory="1048576" />
    </numa>
</cpu>

While the vcpu section does send the current and maximum cpus as follows:

<vcpu current='1'>16</vcpu>

As I understand the cpus list should actually send only the cpu ids of the current cpus as it is currently doing, not the maximum range and therefore the issue lies in the libvirt which should be performing the calculation and not the engine.

Please confirm.

Comment 4 Ryan Barry 2019-06-03 09:08:05 UTC

Libvirt does perform the calculation, IF placement="auto" is used. It also tries to do the right thing if a cpuset is specified. We are not doing it. Test this.

This cannot be done blindly, since we allow users to manage NUMA assignment themselves, and that needs to be checked for

Comment 5 Steven Rosenberg 2019-06-04 13:21:35 UTC

In reviewing this issue further, libvirt / QEMU will no longer support the NUMA Configuration CPUs list as only including the current CPUs, but does require all of the CPUs as per the complete error warnings given here:

2019-06-04T08:50:27.937095Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: CPU 2 [socket-id: 2, core-id: 0, thread-id: 0], CPU 3 [socket-id: 3, core-id: 0, thread-id: 0], CPU 4 [socket-id: 4, core-id: 0, thread-id: 0], CPU 5 [socket-id: 5, core-id: 0, thread-id: 0], CPU 6 [socket-id: 6, core-id: 0, thread-id: 0], CPU 7 [socket-id: 7, core-id: 0, thread-id: 0], CPU 8 [socket-id: 8, core-id: 0, thread-id: 0], CPU 9 [socket-id: 9, core-id: 0, thread-id: 0], CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0]
2019-06-04T08:50:27.937122Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future

Therefore, a proposed fix does work which send the domain XML's NUMA Configuration section as follows (where the maximum CPU is the Max VCPU value from the VCPU section of the XML):

<cpu match="exact">
    <model>SandyBridge</model>
    <topology cores="1" sockets="16" threads="1" />
    <numa>
        <cell cpus="0-15" id="0" memory="1048576" />
    </numa>
</cpu>


This change does remove both warnings from the libvirt qemu log for the VM being launched.

Comment 6 Polina 2020-01-22 14:27:29 UTC

Created attachment 1654601 [details]
logs

Hi, could you please look at the scenario described in https://polarion.engineering.redhat.com/polarion/redirect/project/RHEVM3/workitem?id=RHEVM-26904 To reproduce the problem only run Step6 & Step7.

Looks like invalid behavior . Vm fails on start with ERROR: EVENT_ID: VM_DOWN_ERROR(119), VM vm is down with error. Exit message: internal error: CPU IDs in <numa> exceed the <vcpu> count.
Not clear from where this exceeded ID comes. The xml in engine.log is at line  44460.
    ...
    <cpu match="exact">
    <model>Westmere</model>
    <topology cores="2" threads="4" sockets="16"/>
    <numa>
      <cell id="1" cpus="3-4,16-77" memory="524288"/>
      <cell id="0" cpus="0-2,78-138" memory="524288"/>
    </numa>
    </cpu>
    ...


If after this failure I cancel the NUMA configuration for this VM it starts with the following valid xml :
   <numa>
      <cell id='0' cpus='0-127' memory='1048576' unit='KiB'/>
    </numa>
Then I can configure the 2 numa nodes, restart the VM and the loading xml is correct:

     <numa>
      <cell id='0' cpus='0-7,16-71' memory='524288' unit='KiB'/>
      <cell id='1' cpus='8-15,72-127' memory='524288' unit='KiB'/>
    </numa>

logs attached .

Comment 7 Polina 2020-01-22 15:33:12 UTC

Hi Steven, could you please look at https://bugzilla.redhat.com/show_bug.cgi?id=1437559#c6.

Ryan, please let me know if it must be a new bz or just to re-assign this one.

Comment 8 Steven Rosenberg 2020-01-22 15:41:46 UTC

(In reply to Polina from comment #7)
> Hi Steven, could you please look at
> https://bugzilla.redhat.com/show_bug.cgi?id=1437559#c6.
> 
> Ryan, please let me know if it must be a new bz or just to re-assign this
> one.

This one is correct:

    <numa>
      <cell id='0' cpus='0-7,16-71' memory='524288' unit='KiB'/>
      <cell id='1' cpus='8-15,72-127' memory='524288' unit='KiB'/>
    </numa>

So are the other scenarios in the logs I saw, except for the one you mention:

    <numa>
      <cell id="1" cpus="3-4,16-77" memory="524288"/>
      <cell id="0" cpus="0-2,78-138" memory="524288"/>
    </numa>

I cannot access your link, but I would advise creating a separate issue for this with exact step by step instructions on how to simulate this scenario.

You can also link this issue to it.

Thank you.

Comment 10 Ryan Barry 2020-01-23 16:01:11 UTC

*** Bug 1794304 has been marked as a duplicate of this bug. ***

Comment 11 Polina 2020-03-29 15:24:36 UTC

verified on  http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-27 according to the attached Polarion test case

Comment 14 Sandro Bonazzola 2020-05-20 19:59:29 UTC

This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.