Bug 1804046
Summary: | Engine does not reduce scheduling memory when a VM with dynamic hugepages runs. | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> |
Component: | ovirt-engine | Assignee: | Andrej Krejcir <akrejcir> |
Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.3.8 | CC: | ahadas, akrejcir, emarcus, klaas, mavital, mkalinin, pelauter, rdlugyhe |
Target Milestone: | ovirt-4.4.1 | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rhv-4.4.0-29 | Doc Type: | Bug Fix |
Doc Text: |
Previously the RHV Manager did not reduce scheduling memory when a virtual machine with dynamic hugepages was running. Instead, the Manager treated the memory occupied by dynamic hugepages as schedulable memory. As a result, the Manager scheduled more virtual machines on a host than could fit on it. The current release fixes this issue. Now, the Manager treats dynamic hugepages and the memory they occupy correctly.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-23 16:11:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Germano Veit Michel
2020-02-18 05:31:20 UTC
As this can get confusing... BZ1804037 - Scheduling Memory calculation disregards huge-pages ---> for not considering statically allocated hugepages at kernel cmdline when calculating scheduling memory BZ1804046 - Engine does not reduce scheduling memory when a VM with dynamic hugepages runs ---> for not considering VMs running with dynamic hugepages when calculating scheduling memory Isn't this a dupe of the other bug you just opened? Seems like docs can cover it. Andrej, thoughts? Ryan, I'm not sure, why would this be a DUP? And how can this be fixed by a Docs change? I'm confused... Ok, so it's more or less two scheduling bugs around hugepages which look very similar, and a docs bug about making hugepages less confusing. Andrej is the scheduler engineer, but I'm curious what the expected resolution for this set of bugs is from a customer perspective. From my POV, we'd document and let a relatively tricky edge case get managed per use case "Engine does not reduce scheduling memory when a VM with dynamic hugepages runs." vs "Scheduling Memory calculation disregards huge-pages" I agree its a bit confusing, but IMHO they are different. 1) The Docs bug is about dynamic hugepages, which are involved only in BZ1804046. The other BZ happens with static hugepages. 2) The other 2 bugs are about Free Scheduling memory not being adjusted. But each bug has a different reason. "Engine does not reduce scheduling memory when a VM with dynamic hugepages runs." --> HP VM runs and Scheduling Memory is not subtracted by VM memory "Scheduling Memory calculation disregards huge-pages" --> Scheduling Memory is not subtracted by amount of static hugepages configured on the host I don't think a Docs bug will solve these, sounds like a code change is needed. And if you come up with a patch that fixes both at once, I'm happy to close as DUP :) It this all about the confusion around static/dynamic hugepages? getRequiredMemoryWithoutHugePages only makes sense for VMs that use static hugepages, right? From a customer perspective: Why even suggest static hugepages when you could have it all dynamic with zero manual grub configuration? I think I already suggested this at the docs bug make dynamic hugepages default again and make hugepages feature easier to use. Then you can just treat a hugepages VM like you would treat any other VM that needs memory. change SLA team to virt, we're not tracking SLA separately anymore The verifying on rhv-4.4.0-31. base on the following tests performed the engine now does reduce the scheduling memory when VM with dynamic huge pages runs, but there is a bug with increasing the scheduling memory back after re-configuring and restart of the VM removing the huge pages (please see the Test3 below ). This is the reason for changing the status to failed qa. Pre-condition: In Get https://{{host}}/ovirt-engine/api/hosts response: <max_scheduling_memory>33205256192</max_scheduling_memory> on host: /etc/vdsm/vdsm.conf [performance] use_dynamic_hugepages = true Test1. On host: echo 2 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages Configure VM (size 2048) with hugepages=1048576 (custom properties) Run the VM. Result ok: max_scheduling_memory reduces accordingly <max_scheduling_memory>30952914944</max_scheduling_memory> Test2. echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages Configure VM (size 4096) with hugepages=1048576 (custom properties) Result ok: max_scheduling_memory reduces accordingly <max_scheduling_memory>28801236992</max_scheduling_memory> Shutdown the VM, Result ok: max_scheduling_memory increases back to 33205256192 Test3. Configure the same VM (size 4096) with no hugepages. Re-start The VM. Result failed: max_scheduling_memory reduces as even the hugepages are configured <max_scheduling_memory>28801236992</max_scheduling_memory> (??) Expected: max_scheduling_memory must reduce only by size of the VM <max_scheduling_memory>33205252096</max_scheduling_memory>, where 33205252096 = 33205256192 - 4096(vm size) Test4. echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB Configure VM1 (size 4096) with hugepages=1048576, VM2 (size 1024) with hugepages=1048576, VM3 (size 2048) with hugepages=1048576. Run VM1: max_scheduling_memory reduces to 28801236992 Run VM2: max_scheduling_memory reduces to 27566014464 Run VM3: max_scheduling_memory reduces to 25254952960 Too many scenarios here. Let's get a new bug for Test#3 and verify the rest with a documented limitation so we can at least try to get part of this to 4.3.z The Test3 looks to be working as expected, there is just a mistake with the units. The 'max_scheduling_memory' is displayed in Bytes and VM size is in MiB. So after starting the VM, the scheduling memory should be: 33205256192 - 4096 * 1024 * 1024 = 28910288896 (27571 MiB) minus some overhead, so the reported scheduling memory 28801236992 Bytes (27467 MiB) looks correct. so, I'm verifying on the base of testing described in https://bugzilla.redhat.com/show_bug.cgi?id=1804046#c22 Inserted new bug for test3 https://bugzilla.redhat.com/show_bug.cgi?id=1828290 I inserted the 1828290 before I've read the comment 24. So, after re-test close it as not a bug. sorry for mess :) The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3807 |