Bug 1812316 - NumaPinningHelper is not huge pages aware, denies migration to suitable host
Summary: NumaPinningHelper is not huge pages aware, denies migration to suitable host
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.3.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.3
: 4.4.3
Assignee: Lucia Jelinkova
QA Contact: Polina
URL:
Whiteboard:
: 1720558 (view as bug list)
Depends On:
Blocks: 2024575
TreeView+ depends on / blocked
 
Reported: 2020-03-11 01:25 UTC by Germano Veit Michel
Modified: 2024-03-25 15:44 UTC (History)
6 users (show)

Fixed In Version: rhv-4.4.3-3
Doc Type: Enhancement
Doc Text:
With this enhancement, when scheduling a Virtual Machine with pinned NUMA nodes, memory requirements are calculated correctly by taking into account the available memory as well as hugepages allocated on NUMA nodes.
Clone Of:
Environment:
Last Closed: 2020-11-24 13:09:21 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4898511 0 None None None 2020-03-12 05:04:01 UTC
Red Hat Product Errata RHSA-2020:5179 0 None None None 2020-11-24 13:09:46 UTC
oVirt gerrit 108190 0 master MERGED engine: Make NUMA work with hugepages 2021-02-10 10:26:04 UTC
oVirt gerrit 108191 0 master MERGED virt: report numa stats with hugepages 2021-02-10 10:26:04 UTC

Description Germano Veit Michel 2020-03-11 01:25:16 UTC
Description of problem:

A host cannot be selected as migration destination for a High Performance Virtual Machine because the NUMA filter is expecting the host to have normal free memory for the VM on each Host NUMA. But the host already has static free hugepages reserved to accomodate the VM.

This is very problematic if the host has large amounts of memory reserved for huge pages.

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.8.2-0.5.el7.noarch

How reproducible:
Always

Steps to Reproduce:
* The key to reproduce this to have the host have less normal (non HugePages) memory on each NUMA node than the VM uses (HugePages) on each vNUMA. The easiest way to reproduce it is to reserve almost all host memory to hugepages, so there is little free normal memory on each node, but plenty of HPs. See one way to do it:

1. The Virtual Machine has 4GB of RAM, divided into 2 vNUMA nodes

<vm_numa_nodes>
  <vm_numa_node href="/ovirt-engine/api/vms/8f54d8f6-3a16-49cd-980d-73ec684796c5/numanodes/42c70765-a82c-4488-97f1-bc1146c5b213" id="42c70765-a82c-4488-97f1-bc1146c5b213">
    <cpu>
      <cores>
        <core>
          <index>0</index>
        </core>
      </cores>
    </cpu>
    <index>0</index>
    <memory>2048</memory>
    <numa_node_pins>
      <numa_node_pin>
        <index>0</index>
      </numa_node_pin>
    </numa_node_pins>
    <vm href="/ovirt-engine/api/vms/8f54d8f6-3a16-49cd-980d-73ec684796c5" id="8f54d8f6-3a16-49cd-980d-73ec684796c5"/>
  </vm_numa_node>
  <vm_numa_node href="/ovirt-engine/api/vms/8f54d8f6-3a16-49cd-980d-73ec684796c5/numanodes/1fae16f3-0ad7-4b70-b65a-227633ee81ed" id="1fae16f3-0ad7-4b70-b65a-227633ee81ed">
    <cpu>
      <cores>
        <core>
          <index>1</index>
        </core>
      </cores>
    </cpu>
    <index>1</index>
    <memory>2048</memory>
    <numa_node_pins>
      <numa_node_pin>
        <index>1</index>
      </numa_node_pin>
    </numa_node_pins>
    <vm href="/ovirt-engine/api/vms/8f54d8f6-3a16-49cd-980d-73ec684796c5" id="8f54d8f6-3a16-49cd-980d-73ec684796c5"/>
  </vm_numa_node>
</vm_numa_nodes>

2. The VM is using 1G HugePages, so each of its vNUMA node uses 2x1G Huge Pages

<custom_properties>
  <custom_property>
    <name>hugepages</name>
    <value>1048576</value>
  </custom_property>
</custom_properties>


3. The destination Host has 10GB of total memory (4GB per NUMA node), with 6GB Reserved for Huge Pages.

engine=# select numa_node_index,mem_total,cpu_count,mem_free,usage_mem_percent from numa_node_cpus_view where vds_id = '966a05c2-493c-447d-85fd-cedafc4680ed';
 numa_node_index | mem_total | cpu_count | mem_free | usage_mem_percent 
-----------------+-----------+-----------+----------+-------------------
               0 |      5119 |         2 |     2409 |                53
               1 |      5120 |         2 |      629 |                88

# grep HugePages_ /proc/meminfo 
HugePages_Total:       6
HugePages_Free:        6

# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
2
# cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
4

The host is totally idle, not running anything. The low mem_free is due to hugepage reservation.

And there are 2+ free hugepages on each Host NUMA, enough to run the VM.

4. But the engine denies it. Looking at the code I think mem_free on node1 (629MB) is causing it in NumaPinningHelper.java. This is not right, as the Host has free hugepages to accomodate the VM.

2020-03-11 10:54:46,207+10 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.NumaPolicyUnit] (default task-5) [802ba36c-a677-4c61-822b-46c94ceec426] Host 'host2.kvm' cannot accommodate memory of VM's pinned virtual NUMA nodes within host's physical NUMA nodes

2020-03-11 10:54:46,211+10 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-5) [802ba36c-a677-4c61-822b-46c94ceec426] Candidate host 'host2.kvm' ('966a05c2-493c-447d-85fd-cedafc4680ed') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'NUMA' (correlation id: null)

5. Disabling the NUMA filter makes the schedule not filter it.

Actual results:
Scheduler filters out the destination host due to insufficient normal memory, but VM uses reserved hugepages which are free

Expected results:
Migration is allowed

Comment 1 Lucia Jelinkova 2020-04-29 09:15:54 UTC
*** Bug 1720558 has been marked as a duplicate of this bug. ***

Comment 2 Arik 2020-08-13 13:57:21 UTC
Lucia, is there anything else that this bug is pending on?

Comment 3 Lucia Jelinkova 2020-08-17 12:02:21 UTC
Nothing that I am aware of.

Comment 5 Polina 2020-09-24 11:06:57 UTC
Verified on vdsm-4.40.29-1.el8ev.x86_64, ovirt-engine-4.4.3.2-0.19.el8ev.noarch
According to the attached 
https://polarion.engineering.redhat.com/polarion/redirect/project/RHEVM3/workitem?id=RHEVM-27430
https://polarion.engineering.redhat.com/polarion/redirect/project/RHEVM3/workitem?id=RHEVM-27431

Comment 7 Eli Marcus 2020-11-22 17:50:52 UTC
Hi Lucia, 
please review this doc text for the errata and release notes: 

With this enhancement, when scheduling a Virtual Machine with pinned NUMA nodes, memory requirements are calculated correctly by taking into account the available memory as well as allocated hugepages.

Comment 8 Lucia Jelinkova 2020-11-23 08:43:27 UTC
I'd maybe reworded it a bit to: ...by taking into account hugepages allocated on numa nodes.

Comment 12 errata-xmlrpc 2020-11-24 13:09:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5179


Note You need to log in before you can comment on or make changes to this bug.