Description of problem: The engine generates somewhat incorrect alerts if the HugePages reservation on the host exceeds log_max_memory_used_threshold, even if the host is not running any VMs an completely empty. Even though its a bit debatable, ideally this is a false alert and should not be generated. See reproduction steps for details. It is somewhat easy to happen if the hypervisor is huge (i.e. 12TB on customer case) and the user has most of that memory with static hugepages for high performance VMs, without any other type of VMs there. The VMs use the HPs only. Version-Release number of selected component (if applicable): ovirt-engine-4.4.8.5-0.4.el8ev.noarch vdsm-4.40.80.6-1.el8ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Set the cluster memory threshold to 50% to make it easier to see engine=# select name,log_max_memory_used_threshold from cluster; name | log_max_memory_used_threshold ---------+------------------------------- Default | 50 (1 row) 2. On a host with 8GB total, reserve 5G (62.5%) with HugePages # egrep '^HugePages_|^Mem' /proc/meminfo MemTotal: 8151820 kB MemFree: 1999400 kB MemAvailable: 2187876 kB HugePages_Total: 5 HugePages_Free: 5 HugePages_Rsvd: 0 HugePages_Surp: 0 3. Observe engine logs, even without any VM running 2021-09-22 14:14:25,295+10 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-14) [a52499c] EVENT_ID: VDS_HIGH_MEM_USE(532), Used memory of host host2.kvm.local in cluster Default [70%] exceeded defined threshold [50%]. 4. VDSM is reporting the 70% memUsed, but also the 5 free HPs. [root@host2 ~]# vdsm-client Host getStats { ... "hugepages": { "1048576": { "free_hugepages": 5, "nr_hugepages": 5, "nr_hugepages_mempolicy": 5, "nr_overcommit_hugepages": 0, "resv_hugepages": 0, "surplus_hugepages": 0, "vm.free_hugepages": 5 ... "memFree": 2347, "memShared": 0, "memUsed": "71", ... "numaNodeMemFree": { "0": { "hugepages": { "1048576": { "freePages": 5 }, "2048": { "freePages": 0 }, "4": { "freePages": 490799 } }, "memFree": "1917", "memPercent": 76 } }, ... "swapFree": 0, "swapTotal": 0, ... "vmActive": 0, "vmCount": 0, "vmMigrating": 0 } Actual results: * Somewhat false alert is generated Expected results: * Don't generate this alert
Due to memUsed = 71, the Admin Portal also shows the host with the graph bar at 71% and yellow. It is not really true... Maybe somehow take into account the free huge pages?
Verified with: ovirt-engine-4.5.1.2-0.11.el8ev.noarch Steps and results: 1. Set the cluster memory threshold to 50% 2. On a host(not running any VM) with 62G total memory, reserve 40G (64.5%) with HugePages # egrep '^HugePages_|^Mem' /proc/meminfo MemTotal: 65366332 kB MemFree: 20537144 kB MemAvailable: 21274804 kB HugePages_Total: 40 HugePages_Free: 40 HugePages_Rsvd: 0 HugePages_Surp: 0 3. Check engine logs to see if there is no VDS_HIGH_MEM_USE warning: There is no VDS_HIGH_MEM_USE in engine.log 4. Create a VM with 16G memory, no hugepages, run the VM on the host, load memory: # free -m total used free shared buff/cache available Mem: 15798 15200 392 8 205 324 Swap: 0 0 0 5. Check engine logs to see if there is a VDS_HIGH_MEM_USE warning: There is a VDS_HIGH_MEM_USE warning saying: 2022-06-27 17:59:01,492+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-15) [] EVENT_ID: VDS_HIGH_MEM_USE(532), Used memory of host host_mixed_1 in cluster golden_env_mixed_1 [28%] exceeded defined threshold [50%]. 6. Check memory usage on UI to see if it's a total usage of normal memory and hugepages memory: The memory usage number is 28% 7. Create another VM with 40G memory, hugepages=1048576, run the VM also on the host, check memory usage on UI to see if it's a total usage of normal memory and hugepages memory: The memory usage number is 92% According to the test results, the VDS_HIGH_MEM_USE warning and the memory usage on UI work as expected, except the usage number in the warning should be the usage of normal memory, but not the total usage of normal memory and hugepages memory. Filed a bug tracking for the wrong usage number issue, see https://bugzilla.redhat.com/show_bug.cgi?id=2101503. Move this bug to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555
Due to QE capacity, we are not going to cover this issue in our automation