Bug 2006625

Summary: Engine generates VDS_HIGH_MEM_USE events for empty hosts that have most memory reserved by huge pages
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Lucia Jelinkova <ljelinko>
Status: CLOSED ERRATA QA Contact: Qin Yuan <qiyuan>
Severity: medium Docs Contact:
Priority: high    
Version: 4.4.8CC: ahadas, apinnick, dfodor, emarcus, ljelinko, mavital, mburman, qiyuan, tgolembi
Target Milestone: ovirt-4.5.1Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.1.2 Doc Type: Bug Fix
Doc Text:
Previously, memory allocated by hugepages was included in the host memory usage calculation, resulting in high memory usage in the Administration Portal, even with no running VMs, and false VDS_HIGH_MEM_USE warnings in the logs. In this release, hugepages are not included in the memory usage. VDS_HIGH_MEM_USE warnings are logged only when normal (not hugepages) memory usage is above a defined threshold. Memory usage in the Administration Portal is calculated from the normal and hugepages used memory, not from allocated memory.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-14 12:54:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2021-09-22 04:23:32 UTC
Description of problem:

The engine generates somewhat incorrect alerts if the HugePages reservation on the host exceeds log_max_memory_used_threshold, even if the host is not running any VMs an completely empty. Even though its a bit debatable, ideally this is a false alert and should not be generated. See reproduction steps for details.

It is somewhat easy to happen if the hypervisor is huge (i.e. 12TB on customer case) and the user has most of that memory with static hugepages for high performance VMs, without any other type of VMs there. The VMs use the HPs only.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.8.5-0.4.el8ev.noarch
vdsm-4.40.80.6-1.el8ev.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Set the cluster memory threshold to 50% to make it easier to see

engine=# select name,log_max_memory_used_threshold from cluster;
  name   | log_max_memory_used_threshold 
---------+-------------------------------
 Default |                            50
(1 row)

2. On a host with 8GB total, reserve 5G (62.5%) with HugePages

# egrep '^HugePages_|^Mem' /proc/meminfo 
MemTotal:        8151820 kB
MemFree:         1999400 kB
MemAvailable:    2187876 kB
HugePages_Total:       5
HugePages_Free:        5
HugePages_Rsvd:        0
HugePages_Surp:        0

3. Observe engine logs, even without any VM running

2021-09-22 14:14:25,295+10 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14) [a52499c] EVENT_ID: VDS_HIGH_MEM_USE(532), Used memory of host host2.kvm.local in cluster Default [70%] exceeded defined threshold [50%].

4. VDSM is reporting the 70% memUsed, but also the 5 free HPs.

[root@host2 ~]# vdsm-client Host getStats
{
    ...
    "hugepages": {
        "1048576": {
            "free_hugepages": 5,
            "nr_hugepages": 5,
            "nr_hugepages_mempolicy": 5,
            "nr_overcommit_hugepages": 0,
            "resv_hugepages": 0,
            "surplus_hugepages": 0,
            "vm.free_hugepages": 5
    ...
    "memFree": 2347,
    "memShared": 0,
    "memUsed": "71",
    ...
    "numaNodeMemFree": {
        "0": {
            "hugepages": {
                "1048576": {
                    "freePages": 5
                },
                "2048": {
                    "freePages": 0
                },
                "4": {
                    "freePages": 490799
                }
            },
            "memFree": "1917",
            "memPercent": 76
        }
    },
    ...
    "swapFree": 0,
    "swapTotal": 0,
    ...
    "vmActive": 0,
    "vmCount": 0,
    "vmMigrating": 0
}


Actual results:
* Somewhat false alert is generated

Expected results:
* Don't generate this alert

Comment 1 Germano Veit Michel 2021-09-22 04:25:29 UTC
Due to memUsed = 71, the Admin Portal also shows the host with the graph bar at 71% and yellow. It is not really true...

Maybe somehow take into account the free huge pages?

Comment 22 Qin Yuan 2022-06-27 16:04:23 UTC
Verified with:
ovirt-engine-4.5.1.2-0.11.el8ev.noarch

Steps and results:
1. Set the cluster memory threshold to 50%

2. On a host(not running any VM) with 62G total memory, reserve 40G (64.5%) with HugePages
    # egrep '^HugePages_|^Mem' /proc/meminfo
    MemTotal:       65366332 kB
    MemFree:        20537144 kB
    MemAvailable:   21274804 kB
    HugePages_Total:      40
    HugePages_Free:       40
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    
3. Check engine logs to see if there is no VDS_HIGH_MEM_USE warning:
    There is no VDS_HIGH_MEM_USE in engine.log
    
4. Create a VM with 16G memory, no hugepages, run the VM on the host, load memory:
    # free -m
                  total        used        free      shared  buff/cache   available
    Mem:          15798       15200         392           8         205         324
    Swap:             0           0           0

5. Check engine logs to see if there is a VDS_HIGH_MEM_USE warning:
    There is a VDS_HIGH_MEM_USE warning saying:
2022-06-27 17:59:01,492+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [] EVENT_ID: VDS_HIGH_MEM_USE(532), Used memory of host host_mixed_1 in cluster golden_env_mixed_1 [28%] exceeded defined threshold [50%].

6. Check memory usage on UI to see if it's a total usage of normal memory and hugepages memory:
   The memory usage number is 28%
   
7. Create another VM with 40G memory, hugepages=1048576, run the VM also on the host, check memory usage on UI to see if it's a total usage of normal memory and hugepages memory:
   The memory usage number is 92%
   
According to the test results, the VDS_HIGH_MEM_USE warning and the memory usage on UI work as expected, except the usage number in the warning should be the usage of normal memory, but not the total usage of normal memory and hugepages memory. Filed a bug tracking for the wrong usage number issue, see https://bugzilla.redhat.com/show_bug.cgi?id=2101503. 

Move this bug to VERIFIED.

Comment 26 errata-xmlrpc 2022-07-14 12:54:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5555

Comment 27 meital avital 2022-08-03 07:17:48 UTC
Due to QE capacity, we are not going to cover this issue in our automation