Bug 1804046

Summary: Engine does not reduce scheduling memory when a VM with dynamic hugepages runs.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Andrej Krejcir <akrejcir>
Status: CLOSED ERRATA QA Contact: Polina <pagranat>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.8CC: ahadas, akrejcir, emarcus, klaas, mavital, mkalinin, pelauter, rdlugyhe
Target Milestone: ovirt-4.4.1   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rhv-4.4.0-29 Doc Type: Bug Fix
Doc Text:
Previously the RHV Manager did not reduce scheduling memory when a virtual machine with dynamic hugepages was running. Instead, the Manager treated the memory occupied by dynamic hugepages as schedulable memory. As a result, the Manager scheduled more virtual machines on a host than could fit on it. The current release fixes this issue. Now, the Manager treats dynamic hugepages and the memory they occupy correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-23 16:11:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2020-02-18 05:31:20 UTC
Description of problem:

The engine does not subtract VMs using dynamic hugepages from committed memory when calculating host scheduling memory, so the scheduling memory does not go down, allowing more and more VMs to be scheduled on the host. This can cause out of memory.

Version-Release number of selected component (if applicable):
vdsm-4.30.40-1.el7ev.x86_64
ovirt-engine-4.3.8.2-0.4.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Configure Dynamic Hugepages on the host (no static hugepages config on kernel cmdline)
   1.1. /etc/vdsm/vdsm.conf
        [performance]
        use_dynamic_hugepages = true
   1.2. Administration -> Configure -> Scheduling Policies -> Edit -> Disable "Huge Pages" Filter

2. Check Scheduling Memory of the Host
   <max_scheduling_memory>7963934720</max_scheduling_memory>

3. Run a VM with 2G of 1G hugepages

2020-02-18 15:23:18,687+1000 INFO  (vm/8f54d8f6) [virt.vm] (vmId='8f54d8f6-3a16-49cd-980d-73ec684796c5') Allocating 2 (1048576) hugepages (memsize 2097152) (vm:2270)

4. Check Scheduling Memory, seems to have reduced only the overhead of the VM, not the actual memory (see below)
   <max_scheduling_memory>7775191040</max_scheduling_memory>

Actual results:
Overcommit

Expected results:
Prevent overcommit

Additional info:

The Commited Memory in the Scheduling Memory does not include the HugePages for each VM.
The loop of getTotalRequiredMemoryInMb for all VMs on the host does not take it into account:
It calls this, which deliberately does not account for HugePages.

    /**
     * A convenience method that makes it easier to express the difference
     * between normal and huge pages backed VM in scheduler.
     *
     * @return The amount of non huge page memory needed for the VM
     */
    public static Integer getRequiredMemoryWithoutHugePages(VmBase vmBase) {
        if (isBackedByHugepages(vmBase)) {
            return 0;
        } else {
            return vmBase.getMemSizeMb();
        }
    }

Comment 1 Germano Veit Michel 2020-02-18 05:44:43 UTC
As this can get confusing...

BZ1804037 - Scheduling Memory calculation disregards huge-pages                                 ---> for not considering statically allocated hugepages at kernel cmdline when calculating scheduling memory
BZ1804046 - Engine does not reduce scheduling memory when a VM with dynamic hugepages runs      ---> for not considering VMs running with dynamic hugepages               when calculating scheduling memory

Comment 2 Germano Veit Michel 2020-02-18 05:48:06 UTC
See also Docs BZ1785507.

Comment 3 Ryan Barry 2020-02-19 02:14:25 UTC
Isn't this a dupe of the other bug you just opened? Seems like docs can cover it.

Andrej, thoughts?

Comment 4 Germano Veit Michel 2020-02-19 02:20:14 UTC
Ryan,

I'm not sure, why would this be a DUP? 
And how can this be fixed by a Docs change? I'm confused...

Comment 5 Ryan Barry 2020-02-19 02:31:05 UTC
Ok, so it's more or less two scheduling bugs around hugepages which look very similar, and a docs bug about making hugepages less confusing. Andrej is the scheduler engineer, but I'm curious what the expected resolution for this set of bugs is from a customer perspective. From my POV, we'd document and let a relatively tricky edge case get managed per use case

"Engine does not reduce scheduling memory when a VM with dynamic hugepages runs."  vs "Scheduling Memory calculation disregards huge-pages"

Comment 6 Germano Veit Michel 2020-02-19 02:40:22 UTC
I agree its a bit confusing, but IMHO they are different.

1) The Docs bug is about dynamic hugepages, which are involved only in BZ1804046. The other BZ happens with static hugepages.

2) The other 2 bugs are about Free Scheduling memory not being adjusted. But each bug has a different reason.

"Engine does not reduce scheduling memory when a VM with dynamic hugepages runs." --> HP VM runs and Scheduling Memory is not subtracted by VM memory
"Scheduling Memory calculation disregards huge-pages"                             --> Scheduling Memory is not subtracted by amount of static hugepages configured on the host

I don't think a Docs bug will solve these, sounds like a code change is needed.

And if you come up with a patch that fixes both at once, I'm happy to close as DUP :)

Comment 9 Klaas Demter 2020-02-20 13:56:47 UTC
It this all about the confusion around static/dynamic hugepages? getRequiredMemoryWithoutHugePages only makes sense for VMs that use static hugepages, right?

From a customer perspective: Why even suggest static hugepages when you could have it all dynamic with zero manual grub configuration? I think I already suggested this at the docs bug make dynamic hugepages default again and make hugepages feature easier to use. Then you can just treat a hugepages VM like you would treat any other VM that needs memory.

Comment 20 Michal Skrivanek 2020-03-10 12:25:45 UTC
change SLA team to virt, we're not tracking SLA separately anymore

Comment 22 Polina 2020-04-26 18:44:34 UTC
The verifying on rhv-4.4.0-31.

base on the following tests performed the engine now does reduce the scheduling memory when VM with dynamic huge pages runs, but there is a bug with increasing the scheduling memory back after re-configuring and restart of the VM removing the huge pages  (please see the Test3 below ). This is the reason for changing the status to failed qa.

Pre-condition:

In Get https://{{host}}/ovirt-engine/api/hosts  response:
<max_scheduling_memory>33205256192</max_scheduling_memory>
on host:
/etc/vdsm/vdsm.conf
        [performance]
        use_dynamic_hugepages = true

Test1.
On host: echo 2 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
Configure VM (size 2048) with hugepages=1048576 (custom properties)
Run the VM.

Result ok: max_scheduling_memory reduces accordingly
        <max_scheduling_memory>30952914944</max_scheduling_memory>

Test2.
echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages 
Configure VM (size 4096) with hugepages=1048576 (custom properties)
Result ok: max_scheduling_memory reduces accordingly
        <max_scheduling_memory>28801236992</max_scheduling_memory>
Shutdown the VM,
Result ok: max_scheduling_memory increases back to 33205256192

Test3.
Configure the same VM (size 4096) with no hugepages. Re-start The VM.
Result failed: 
       max_scheduling_memory reduces as even the hugepages are configured <max_scheduling_memory>28801236992</max_scheduling_memory>  (??)
Expected: max_scheduling_memory must reduce only by size of the VM 
          <max_scheduling_memory>33205252096</max_scheduling_memory>, where 33205252096 = 33205256192 - 4096(vm size)

Test4. 
echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB
Configure VM1 (size 4096) with hugepages=1048576, VM2 (size 1024) with hugepages=1048576, VM3 (size 2048) with hugepages=1048576.
Run VM1: max_scheduling_memory reduces to 28801236992
Run VM2: max_scheduling_memory reduces to 27566014464
Run VM3: max_scheduling_memory reduces to 25254952960

Comment 23 Ryan Barry 2020-04-26 22:27:42 UTC
Too many scenarios here.

Let's get a new bug for Test#3 and verify the rest with a documented limitation so we can at least try to get part of this to 4.3.z

Comment 24 Andrej Krejcir 2020-04-27 09:24:13 UTC
The Test3 looks to be working as expected, there is just a mistake with the units. The 'max_scheduling_memory' is displayed in Bytes and VM size is in MiB. So after starting the VM, the scheduling memory should be:  33205256192 - 4096 * 1024 * 1024 = 28910288896  (27571 MiB)  minus some overhead, so the reported scheduling memory 28801236992 Bytes (27467 MiB) looks correct.

Comment 25 Polina 2020-04-27 14:09:37 UTC
so, I'm verifying on the base of testing described in https://bugzilla.redhat.com/show_bug.cgi?id=1804046#c22

Inserted new bug for test3 https://bugzilla.redhat.com/show_bug.cgi?id=1828290

Comment 26 Polina 2020-04-27 14:34:35 UTC
I inserted the 1828290 before I've read the comment 24. So, after re-test close it as not a bug. sorry for mess :)

Comment 27 RHEL Program Management 2020-06-15 20:47:59 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 34 errata-xmlrpc 2020-09-23 16:11:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3807