Description of problem: Since BZ1749630, vdsm started adding SReclaimable to the reported free memory 'memFree'. However, the percent calculation 'memUsed' is not taking SReclaimable into account, producing a few side effects let alone the discrepancy. See this, from a host with 512G: $ egrep 'mem[PFSUCA]' sos_commands/vdsm/vdsm-client_Host_getStats "memShared": 56714, "memUsed": "81", "memCommitted": 294912, "memAvailable": 257590, "memFree": 257334, If memFree is ~256G on a host with 512G, then memUsed should be ~50%, not 81%.. The 30% difference are SReclaimable (160G) MemFree: 99845680 kB Buffers: 100560 kB Cached: 5942972 kB SReclaimable: 158048536 kB Host sampling is not taking SReclaimable into account for memUsed calculation: lib/vdsm/virt/sampling.py: 172 freeOrCached = (meminfo['MemFree'] + 173 meminfo['Cached'] + meminfo['Buffers']) 174 self.memUsed = 100 - int(100.0 * (freeOrCached) / meminfo['MemTotal']) This has a few side effects: 1) The Hosts tab memory graph in the Admin Portal shows 81% usage instead of ~50% 2) Engine warns of memory thresholds exceeded more easily (audit_log, events) as it uses memUsed backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/HostMonitoring.java: 325 private void checkVdsMemoryThresholdPercentage(Cluster cluster, VdsStatistics stat) { 326 Integer maxUsedPercentageThreshold = cluster.getLogMaxMemoryUsedThreshold(); 327 328 if (stat.getUsageMemPercent() > maxUsedPercentageThreshold) { 329 logMemoryAuditLog(vds, cluster, stat, AuditLogType.VDS_HIGH_MEM_USE, maxUsedPercentageThreshold); 330 } 331 } 3) API host statistics also seem to return mem.free and mem.used based on the percent that comes from VDSM, giving misleading values for mem.free and mem.used and triggering monitoring warnings (i.e. Nagios) backend/manager/modules/restapi/jaxrs/src/main/java/org/ovirt/engine/api/restapi/resource/HostStatisticalQuery.java: 41 public List<Statistic> getStatistics(VDS entity) { 42 VdsStatistics s = entity.getStatisticsData(); 43 // if user queries host statistics before host installation completed, null values are possible (therefore added checks). 44 long memTotal = entity.getPhysicalMemMb()==null ? 0 : entity.getPhysicalMemMb() * Mb; 45 long memUsed = (s==null || s.getUsageMemPercent()==null) ? 0 : memTotal * s.getUsageMemPercent() / 100; 46 List<Statistic> statistics = asList(setDatum(clone(MEM_TOTAL), memTotal), 47 setDatum(clone(MEM_USED), memUsed), 48 setDatum(clone(MEM_FREE), memTotal-memUsed), See: <statistic href="/ovirt-engine/api/hosts/5abf7bf8-d35f-4077-92c6-3cdc65f635a1/statistics/7816602b-c05c-3db7-a4da-3769f7ad8896" id="7816602b-c05c-3db7-a4da-3769f7ad8896"> <name>memory.total</name> <description>Total memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>540307095552</datum> </value> </values> <host href="/ovirt-engine/api/hosts/5abf7bf8-d35f-4077-92c6-3cdc65f635a1" id="5abf7bf8-d35f-4077-92c6-3cdc65f635a1"/> </statistic> <statistic href="/ovirt-engine/api/hosts/5abf7bf8-d35f-4077-92c6-3cdc65f635a1/statistics/b7499508-c1c3-32f0-8174-c1783e57bb08" id="b7499508-c1c3-32f0-8174-c1783e57bb08"> <name>memory.used</name> <description>Used memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>432245676441</datum> </value> </values> <host href="/ovirt-engine/api/hosts/5abf7bf8-d35f-4077-92c6-3cdc65f635a1" id="5abf7bf8-d35f-4077-92c6-3cdc65f635a1"/> </statistic> <statistic href="/ovirt-engine/api/hosts/5abf7bf8-d35f-4077-92c6-3cdc65f635a1/statistics/5a0fba9d-33d7-3cbf-addd-ba462040c946" id="5a0fba9d-33d7-3cbf-addd-ba462040c946"> <name>memory.free</name> <description>Free memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>108061419111</datum> </value> </values> <host href="/ovirt-engine/api/hosts/5abf7bf8-d35f-4077-92c6-3cdc65f635a1" id="5abf7bf8-d35f-4077-92c6-3cdc65f635a1"/> </statistic> Note: caches and buffers were removed in BZ1751423 as they were always zero, they did not include SReclaimable anyway. Version-Release number of selected component (if applicable): - 4.3.9 engine and vdsm-4.30.44 (customer) - don't see any changes on master How reproducible: - Customer with high SReclaimable - Not very easy to get high SReclaimable for the discrepancy to be very clear like above, probably needs a good uptime and load.
Verified with: vdsm-4.40.50.4-1.el8ev.x86_64 Steps: 1. Create a large number of empty directories on host to make SReclaimable big 2. Check memUsed Results: 1. memUsed is correct. meminfo: [root@ocelot06 ~]# cat /proc/meminfo |grep -e MemTotal -e MemFree -e Buffers -e '^Cached' -e SReclaimable MemTotal: 98597084 kB MemFree: 64820576 kB Buffers: 4540 kB Cached: 3329852 kB SReclaimable: 10779752 kB Consider SReclaimable when calculate memUsed: memUsed = 100-int(100*(MemFree+Buffers+Cached+SReclaimable)/MemTotal) = 100-int(100*(64820576+4540+3329852+10779752)/98597084) = 20 Don't consider SReclaimable when calculate memUsed: memUsed = 100-int(100*(MemFree+Buffers+Cached)/MemTotal) = 100-int(100*(64820576+4540+3329852)/98597084) = 31 Check actual memUsed: [root@ocelot06 ~]# vdsm-client Host getStats |grep -E 'mem[PFSUCA]' "memAvailable": 77306, "memCommitted": 0, "memFree": 77050, "memShared": 0, "memUsed": "20", As you can see, the actual memUsed is the result of considering SReclaimable.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV RHEL Host (ovirt-host) 4.4.z [ovirt-4.4.5] security, bug fix, enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1184
Due to QE capacity, we are not going to cover this issue in our automation