Bug 2006625 - Engine generates VDS_HIGH_MEM_USE events for empty hosts that have most memory reserved by huge pages
Summary: Engine generates VDS_HIGH_MEM_USE events for empty hosts that have most memor...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.8
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ovirt-4.5.1
: ---
Assignee: Lucia Jelinkova
QA Contact: Qin Yuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-22 04:23 UTC by Germano Veit Michel
Modified: 2022-08-03 07:17 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-4.5.1.2
Doc Type: Bug Fix
Doc Text:
Previously, memory allocated by hugepages was included in the host memory usage calculation, resulting in high memory usage in the Administration Portal, even with no running VMs, and false VDS_HIGH_MEM_USE warnings in the logs. In this release, hugepages are not included in the memory usage. VDS_HIGH_MEM_USE warnings are logged only when normal (not hugepages) memory usage is above a defined threshold. Memory usage in the Administration Portal is calculated from the normal and hugepages used memory, not from allocated memory.
Clone Of:
Environment:
Last Closed: 2022-07-14 12:54:31 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 412 0 None open engine: Change the memory usage calculation 2022-06-09 13:23:33 UTC
Red Hat Issue Tracker RHV-43691 0 None None None 2021-09-22 04:24:00 UTC
Red Hat Product Errata RHSA-2022:5555 0 None None None 2022-07-14 12:55:15 UTC

Description Germano Veit Michel 2021-09-22 04:23:32 UTC
Description of problem:

The engine generates somewhat incorrect alerts if the HugePages reservation on the host exceeds log_max_memory_used_threshold, even if the host is not running any VMs an completely empty. Even though its a bit debatable, ideally this is a false alert and should not be generated. See reproduction steps for details.

It is somewhat easy to happen if the hypervisor is huge (i.e. 12TB on customer case) and the user has most of that memory with static hugepages for high performance VMs, without any other type of VMs there. The VMs use the HPs only.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.8.5-0.4.el8ev.noarch
vdsm-4.40.80.6-1.el8ev.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Set the cluster memory threshold to 50% to make it easier to see

engine=# select name,log_max_memory_used_threshold from cluster;
  name   | log_max_memory_used_threshold 
---------+-------------------------------
 Default |                            50
(1 row)

2. On a host with 8GB total, reserve 5G (62.5%) with HugePages

# egrep '^HugePages_|^Mem' /proc/meminfo 
MemTotal:        8151820 kB
MemFree:         1999400 kB
MemAvailable:    2187876 kB
HugePages_Total:       5
HugePages_Free:        5
HugePages_Rsvd:        0
HugePages_Surp:        0

3. Observe engine logs, even without any VM running

2021-09-22 14:14:25,295+10 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14) [a52499c] EVENT_ID: VDS_HIGH_MEM_USE(532), Used memory of host host2.kvm.local in cluster Default [70%] exceeded defined threshold [50%].

4. VDSM is reporting the 70% memUsed, but also the 5 free HPs.

[root@host2 ~]# vdsm-client Host getStats
{
    ...
    "hugepages": {
        "1048576": {
            "free_hugepages": 5,
            "nr_hugepages": 5,
            "nr_hugepages_mempolicy": 5,
            "nr_overcommit_hugepages": 0,
            "resv_hugepages": 0,
            "surplus_hugepages": 0,
            "vm.free_hugepages": 5
    ...
    "memFree": 2347,
    "memShared": 0,
    "memUsed": "71",
    ...
    "numaNodeMemFree": {
        "0": {
            "hugepages": {
                "1048576": {
                    "freePages": 5
                },
                "2048": {
                    "freePages": 0
                },
                "4": {
                    "freePages": 490799
                }
            },
            "memFree": "1917",
            "memPercent": 76
        }
    },
    ...
    "swapFree": 0,
    "swapTotal": 0,
    ...
    "vmActive": 0,
    "vmCount": 0,
    "vmMigrating": 0
}


Actual results:
* Somewhat false alert is generated

Expected results:
* Don't generate this alert

Comment 1 Germano Veit Michel 2021-09-22 04:25:29 UTC
Due to memUsed = 71, the Admin Portal also shows the host with the graph bar at 71% and yellow. It is not really true...

Maybe somehow take into account the free huge pages?

Comment 22 Qin Yuan 2022-06-27 16:04:23 UTC
Verified with:
ovirt-engine-4.5.1.2-0.11.el8ev.noarch

Steps and results:
1. Set the cluster memory threshold to 50%

2. On a host(not running any VM) with 62G total memory, reserve 40G (64.5%) with HugePages
    # egrep '^HugePages_|^Mem' /proc/meminfo
    MemTotal:       65366332 kB
    MemFree:        20537144 kB
    MemAvailable:   21274804 kB
    HugePages_Total:      40
    HugePages_Free:       40
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    
3. Check engine logs to see if there is no VDS_HIGH_MEM_USE warning:
    There is no VDS_HIGH_MEM_USE in engine.log
    
4. Create a VM with 16G memory, no hugepages, run the VM on the host, load memory:
    # free -m
                  total        used        free      shared  buff/cache   available
    Mem:          15798       15200         392           8         205         324
    Swap:             0           0           0

5. Check engine logs to see if there is a VDS_HIGH_MEM_USE warning:
    There is a VDS_HIGH_MEM_USE warning saying:
2022-06-27 17:59:01,492+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [] EVENT_ID: VDS_HIGH_MEM_USE(532), Used memory of host host_mixed_1 in cluster golden_env_mixed_1 [28%] exceeded defined threshold [50%].

6. Check memory usage on UI to see if it's a total usage of normal memory and hugepages memory:
   The memory usage number is 28%
   
7. Create another VM with 40G memory, hugepages=1048576, run the VM also on the host, check memory usage on UI to see if it's a total usage of normal memory and hugepages memory:
   The memory usage number is 92%
   
According to the test results, the VDS_HIGH_MEM_USE warning and the memory usage on UI work as expected, except the usage number in the warning should be the usage of normal memory, but not the total usage of normal memory and hugepages memory. Filed a bug tracking for the wrong usage number issue, see https://bugzilla.redhat.com/show_bug.cgi?id=2101503. 

Move this bug to VERIFIED.

Comment 26 errata-xmlrpc 2022-07-14 12:54:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5555

Comment 27 meital avital 2022-08-03 07:17:48 UTC
Due to QE capacity, we are not going to cover this issue in our automation


Note You need to log in before you can comment on or make changes to this bug.