Description of problem: Attempting to get host memory statistics via API returns some 0 values. In particular: <name>memory.buffers</name> <name>memory.cached</name> Looking at the code, this does not seem to be implemented: https://github.com/oVirt/ovirt-engine/blob/ede62008318d924556bc9dfc5710d90e9519670d/backend/manager/modules/restapi/jaxrs/src/main/java/org/ovirt/engine/api/restapi/resource/HostStatisticalQuery.java#L49 The hypervisor currently has this: # cat /proc/meminfo MemTotal: 6107360 kB MemFree: 4213272 kB MemAvailable: 5253268 kB Buffers: 2088 kB Cached: 1256076 kB SwapCached: 0 kB Active: 992336 kB Inactive: 562412 kB Active(anon): 319080 kB Inactive(anon): 24636 kB Active(file): 673256 kB Inactive(file): 537776 kB Unevictable: 96720 kB Mlocked: 96728 kB SwapTotal: 4194300 kB SwapFree: 4194300 kB Dirty: 32 kB Writeback: 0 kB AnonPages: 393464 kB Mapped: 71644 kB Shmem: 25484 kB Slab: 133084 kB SReclaimable: 84672 kB SUnreclaim: 48412 kB KernelStack: 5872 kB PageTables: 11300 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7247980 kB Committed_AS: 1879860 kB VmallocTotal: 34359738367 kB VmallocUsed: 123108 kB VmallocChunk: 34359535612 kB HardwareCorrupted: 0 kB AnonHugePages: 112640 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 184176 kB DirectMap2M: 6107136 kB Retrieving it from the API, some values are 0 where they shouldn't be: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <statistics> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/7816602b-c05c-3db7-a4da-3769f7ad8896" id="7816602b-c05c-3db7-a4da-3769f7ad8896"> <name>memory.total</name> <description>Total memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>6253707264</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/b7499508-c1c3-32f0-8174-c1783e57bb08" id="b7499508-c1c3-32f0-8174-c1783e57bb08"> <name>memory.used</name> <description>Used memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>687907799</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/5a0fba9d-33d7-3cbf-addd-ba462040c946" id="5a0fba9d-33d7-3cbf-addd-ba462040c946"> <name>memory.free</name> <description>Free memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>5565799465</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/ffc0e1fd-fa34-3f85-9862-8a841c1658bc" id="ffc0e1fd-fa34-3f85-9862-8a841c1658bc"> <name>memory.shared</name> <description>Shared memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>0</datum> <------------ HERE (Looking at the code, this seems to be memory shared by KSM and not shared as in 'free -m', update description?) </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/c81c86f0-bc61-3c78-a543-898b8339d03f" id="c81c86f0-bc61-3c78-a543-898b8339d03f"> <name>memory.buffers</name> <description>IO buffers</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>0</datum> <------------ HERE </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/1b6244ee-8dbd-365d-8762-482ddc05ee11" id="1b6244ee-8dbd-365d-8762-482ddc05ee11"> <name>memory.cached</name> <description>OS caches</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>0</datum> <------------ HERE </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/c43847d7-3bc1-3aaf-b92c-902e64bbdb5b" id="c43847d7-3bc1-3aaf-b92c-902e64bbdb5b"> <name>swap.total</name> <description>Total swap</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>4293918720</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/1a4c1c9b-f3cc-301e-82ce-47d4b9fb5a46" id="1a4c1c9b-f3cc-301e-82ce-47d4b9fb5a46"> <name>swap.free</name> <description>Free swap</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>4293918720</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/27686b4e-ba8d-3576-bc70-d68cbd8a2ba9" id="27686b4e-ba8d-3576-bc70-d68cbd8a2ba9"> <name>swap.used</name> <description>Used swap</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>0</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/ea00da15-de2d-3393-a7cb-810c4b19ed07" id="ea00da15-de2d-3393-a7cb-810c4b19ed07"> <name>swap.cached</name> <description>Swap also in memory</description> <kind>gauge</kind> <type>integer</type> <unit>bytes</unit> <values> <value> <datum>0</datum> <------------ HERE </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/f740b9ad-14a7-3f6c-9b80-efff44777169" id="f740b9ad-14a7-3f6c-9b80-efff44777169"> <name>ksm.cpu.current</name> <description>KSM CPU usage</description> <kind>gauge</kind> <type>decimal</type> <unit>percent</unit> <values> <value> <datum>0</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/a1fab379-66e2-3b1d-9914-81a9e79cb719" id="a1fab379-66e2-3b1d-9914-81a9e79cb719"> <name>cpu.current.user</name> <description>User CPU usage</description> <kind>gauge</kind> <type>decimal</type> <unit>percent</unit> <values> <value> <datum>0</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/a98c1e11-078c-3593-a57e-4b12c1ce9815" id="a98c1e11-078c-3593-a57e-4b12c1ce9815"> <name>cpu.current.system</name> <description>System CPU usage</description> <kind>gauge</kind> <type>decimal</type> <unit>percent</unit> <values> <value> <datum>0</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/4ae97794-f56d-3f05-a9e7-8798887cd1ac" id="4ae97794-f56d-3f05-a9e7-8798887cd1ac"> <name>cpu.current.idle</name> <description>Idle CPU usage</description> <kind>gauge</kind> <type>decimal</type> <unit>percent</unit> <values> <value> <datum>99</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/65860dae-c890-312e-9314-5c01f31225ab" id="65860dae-c890-312e-9314-5c01f31225ab"> <name>cpu.load.avg.5m</name> <description>CPU 5 minute load average</description> <kind>gauge</kind> <type>decimal</type> <unit>percent</unit> <values> <value> <datum>0.010</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics/3ceb2072-a5a5-3b21-9bb2-e966471fd81c" id="3ceb2072-a5a5-3b21-9bb2-e966471fd81c"> <name>boot.time</name> <description>Boot time of the machine</description> <kind>gauge</kind> <type>integer</type> <unit>none</unit> <values> <value> <datum>1568086613</datum> </value> </values> <host href="/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5" id="7d74774b-6bb1-45df-a7be-f855e02a9dd5"/> </statistic> <statistic id="6c5d91a5-6077-3f4e-8390-4023c6178729"> <name>hugepages.2048.free</name> <description>Amount of free huge pages of the given size</description> <kind>gauge</kind> <type>integer</type> <unit>none</unit> <values> <value> <datum>0</datum> </value> </values> </statistic> </statistics> Does not seem to be retrived by VDSM as well: # vdsm-client Host getStats { "cpuStatistics": { "1": { "cpuUser": "0.87", "nodeIndex": 0, "cpuSys": "0.53", "cpuIdle": "98.60" }, "0": { "cpuUser": "0.13", "nodeIndex": 0, "cpuSys": "0.07", "cpuIdle": "99.80" }, "3": { "cpuUser": "0.33", "nodeIndex": 0, "cpuSys": "0.13", "cpuIdle": "99.54" }, "2": { "cpuUser": "0.07", "nodeIndex": 0, "cpuSys": "0.13", "cpuIdle": "99.80" } }, "numaNodeMemFree": { "0": { "memPercent": 32, "memFree": "4114" } }, "memShared": 0, "thpState": "always", "vmCount": 0, "memUsed": "11", "cpuSysVdsmd": "0.20", "cpuIdle": "99.47", "storageDomains": { "e839d116-dc89-467e-a458-178706b6d581": { "code": 0, "actual": true, "acquired": true, "delay": "0.00305974", "lastCheck": "9.1", "version": 5, "valid": true }, "5d08a04b-8682-4bc7-b559-cf551ba12ff6": { "code": 0, "actual": true, "acquired": true, "delay": "0.0012088", "lastCheck": "0.5", "version": 5, "valid": true }, "c0339a3b-3bc0-41d0-bc04-a199762bbcd2": { "code": 0, "actual": true, "acquired": true, "delay": "0.00100209", "lastCheck": "0.5", "version": 0, "valid": true }, "c0b37379-dd4e-43ea-ac73-42fcaa9eed34": { "code": 0, "actual": true, "acquired": true, "delay": "0.00151164", "lastCheck": "0.7", "version": 5, "valid": true } }, "incomingVmMigrations": 0, "network": { "ovirtmgmt": { "sampleTime": 1568248540.278779, "rxDropped": "0", "tx": "407692431", "rxErrors": "0", "duplex": "unknown", "txDropped": "0", "rx": "27097665898", "txErrors": "0", "state": "up", "speed": "1000", "name": "ovirtmgmt" }, "lo": { "sampleTime": 1568248540.278779, "rxDropped": "0", "tx": "28956733", "rxErrors": "0", "duplex": "unknown", "txDropped": "0", "rx": "28956733", "txErrors": "0", "state": "up", "speed": "1000", "name": "lo" }, "ovs-system": { "sampleTime": 1568248540.278779, "rxDropped": "0", "tx": "0", "rxErrors": "0", "duplex": "unknown", "txDropped": "0", "rx": "0", "txErrors": "0", "state": "down", "speed": "1000", "name": "ovs-system" }, ";vdsmdummy;": { "sampleTime": 1568248540.278779, "rxDropped": "0", "tx": "0", "rxErrors": "0", "duplex": "unknown", "txDropped": "0", "rx": "0", "txErrors": "0", "state": "down", "speed": "1000", "name": ";vdsmdummy;" }, "br-int": { "sampleTime": 1568248540.278779, "rxDropped": "1", "tx": "0", "rxErrors": "0", "duplex": "unknown", "txDropped": "0", "rx": "0", "txErrors": "0", "state": "down", "speed": "1000", "name": "br-int" }, "eth0": { "sampleTime": 1568248540.278779, "rxDropped": "0", "tx": "407703485", "rxErrors": "0", "duplex": "unknown", "txDropped": "0", "rx": "27165977856", "txErrors": "0", "state": "up", "speed": "1000", "name": "eth0" } }, "txDropped": "1", "anonHugePages": "110", "ksmPages": 100, "elapsedTime": "159562.82", "cpuLoad": "0.10", "netConfigDirty": "True", "diskStats": { "/var/log": { "free": "93771" }, "/tmp": { "free": "93771" }, "/var/run/vdsm/": { "free": "2957" } }, "memCommitted": 0, "ksmState": false, "vmMigrating": 0, "ksmCpu": 0, "memAvailable": 5581, "cpuUserVdsmd": "0.40", "haStats": { "active": false, "configured": false, "score": 0, "globalMaintenance": false, "localMaintenance": false }, "momStatus": "active", "multipathHealth": {}, "rxDropped": "0", "outgoingVmMigrations": 0, "swapTotal": 4095, "swapFree": 4095, "cpuSys": "0.18", "hugepages": { "2048": { "resv_hugepages": 0, "free_hugepages": 0, "nr_overcommit_hugepages": 0, "surplus_hugepages": 0, "vm.free_hugepages": 0, "nr_hugepages": 0, "nr_hugepages_mempolicy": 0 } }, "dateTime": "2019-09-12T00:35:40 GMT", "cpuUser": "0.35", "memFree": 5325, "bootTime": "1568086613", "vmActive": 0, "v2vJobs": {}, "ksmMergeAcrossNodes": true } So: 1) Are there plans to fix/implement the missing ones: memory.buffers memory.cached swap.cached 2) This one needs a better description to not be confused, as it shared in KSM context: memory.shared Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: # curl -X GET -H 'All-content: true' -H "Accept: application/xml" -u admin@internal:password --cacert /etc/pki/ovirt-engine/apache-ca.pem https://rhv.example.com.com/ovirt-engine/api/hosts/7d74774b-6bb1-45df-a7be-f855e02a9dd5/statistics Actual results: Stats incomplete or ambiguous Expected results: Stats complete and clear.
Apparently this never worked. The API has always returned 0 for these statistics, traced back to the first version of oVirt (in HostStatisticalQuery.java). The Engine also does not have any reference to these statistics, traced back to the first version of oVirt (VdsStatistics.java, VdsBrokerObjectsBuilder.java) VDSM has marked memory.buffers memory.cached as deprecated (vdsm-api.yml). No reference to cached swap was found in my search. If these statistics are meaningful to users they should be fetched by VDSM and propagated all the way up to the API. If not, they should be deprecated in the Engine like they were in VDSM and eventually removed. Seems like a PM decision
Well, I'm not a PM :-), so I can't say much to it. Please note the two deprecated items are in guest stats, not host stats. It seems the requested host stats items are indeed not handled by Vdsm at all. I'm not aware about any plans to add them altough we can add them if they are needed.
Got you. Thanks for asserting that VSDM does not handle these stats
About: "memory.share - Looking at the code, this seems to be memory shared by KSM and not shared as in 'free -m', update description?" Can you please give a link to the location in the code that you are referring to? Giving another name is rather easy (e.g: memory.ksm_shared) but we need to be sure that fetching the ksm shared memory rather than the regular shared memory is done by design, meaning that it is not in itself a bug.
To summarize: 1) memory.buffers memory.cached swap.cached were never retrieved by VDSM or referenced in the Engine, and REST-API has always returned 0 for them. To return valid values, an RFE should be opened to VDSM (note that changes would be made only to versions 4.3/4.4) A PM decision is required as to whether these should be handled properly or dropped. Keep in mind that these metrics might be included in Metrics Store, which would make reporting them by oVirt possibly redundant. 2) memory.shared It seems like ksm-shared-memory (vs regular shared memory) is fetched by design and not by mistake (https://github.com/oVirt/vdsm/blob/master/lib/vdsm/momIF.py#L113) So this metric should indeed be described better. I suggest changing the description and not the name for backwards compatibility reasons.
Martin, what would be the most accurate description of ksm-shared-memory?
The above metrics are reported as part on the RHV metrics store.
(In reply to Ori Liel from comment #6) > Martin, what would be the most accurate description of ksm-shared-memory? Not sure how to frame it correctly as I do not remember the exact value we are reporting. But the kernel docs say: """ The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/: pages_shared - how many shared pages are being used pages_sharing - how many more sites are sharing them i.e. how much saved A high ratio of pages_sharing to pages_shared indicates good sharing, but a high ratio of pages_unshared to pages_sharing indicates wasted effort. """
Opened an RFE for VDSM: https://bugzilla.redhat.com/1758067
OK, changing title of the bug, we will improve documentation of shared memory metric in API and remove un-implemented memory metrics from API, because they should be fetched from metrics store and not from engine.
About memory.shared, the value reported is: stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES (see https://github.com/oVirt/vdsm/blob/master/lib/vdsm/momIF.py#L113) Therefore I will change the description to: "The amount of memory, in bytes, shared among Virtual-Machines on this Host (KSM)"
small correction: memory.shared = "The amount of memory, in bytes, shared among Virtual-Machines on this Host by Kernel Same-page Merging (KSM)"
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Verified on: ovirt-engine-4.4.0-0.20.master.el7.noarch Steps: 1. # curl -X GET -H 'All-content: true' -H "Accept: application/xml" -u admin@internal:<password> --insecure https://<engine-fqdn>/ovirt-engine/api/hosts/<host-id>/statistics Results: No more memory.buffers, memory.cached and swap.cached fields; memory.shared description shown according with comment #17
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3247