Bug 1425951
Summary: | Memory utilization metrics fail to account for system cache | |||
---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Alex Mayberry <amayberr> | |
Component: | C&U Capacity and Utilization | Assignee: | Richard Su <rwsu> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ido Ovadia <iovadia> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 5.7.0 | CC: | amayberr, brant.evans, dscott, ikaur, iovadia, jhajyahy, jhardy, lsmola, maufart, obarenbo, rwsu, simaishi, tzumainn | |
Target Milestone: | GA | Keywords: | TestOnly | |
Target Release: | 5.9.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | c&u:openstack | |||
Fixed In Version: | 5.9.0.1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1444174 (view as bug list) | Environment: | ||
Last Closed: | 2018-03-06 15:17:06 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | Openstack | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1444174 |
Description
Alex Mayberry
2017-02-22 20:43:33 UTC
In the case of the OSP9 provider, I see there are snmp based metrics that are gathered, which could potentially be referenced by CFME to do the math and report on available memory. https://docs.openstack.org/admin-guide/telemetry-measurements.html hardware.memory.total Gauge KB host ID Pollster Total physical memory size hardware.memory.used Gauge KB host ID Pollster Used physical memory size hardware.memory.buffer Gauge KB host ID Pollster Physical memory buffer size hardware.memory.cached Gauge KB host ID Pollster Cached physical memory size Looking at something like this, as an example of what I had in mind. # diff metrics_capture.rb-bkup-2017-02-22 metrics_capture.rb 4c4,5 < hardware.memory.total) --- > hardware.memory.total > hardware.memory.cached) 24c25 < stats['hardware.memory.total'] > 0 ? 100.0 / stats['hardware.memory.total'] * stats['hardware.memory.used'] : 0 --- > stats['hardware.memory.total'] > 0 ? 100.0 / stats['hardware.memory.total'] * (stats['hardware.memory.used'] - stats['hardware.memory.cached']) : 0 Ladislav, any thoughts on the appropriate way to express what we're looking for from the available metrics? I see, 'used' SNMP metric indeed provides buffers and cache as part of it, while these can be considered as free on linux machines. @Mainn, Alex is providing a code snippets from: https://github.com/Ladas/manageiq/blob/ac0c964897481ab42cabc947f2c2dcb803da2d35/app/models/manageiq/providers/openstack/infra_manager/metrics_capture.rb#L3-L3 and https://github.com/Ladas/manageiq/blob/ac0c964897481ab42cabc947f2c2dcb803da2d35/app/models/manageiq/providers/openstack/infra_manager/metrics_capture.rb#L24 @Alex seems like the hardware.memory.buffer can be also considered free? So it should be stats['hardware.memory.total'] > 0 ? 100.0 / stats['hardware.memory.total'] * (stats['hardware.memory.used'] - stats['hardware.memory.cached'] - stats['hardware.memory.buffer']) : 0 right? My example was purely meant to illustrate my point. I'm not actually sure which values are being collected under those names. It was my assumption that the maintainer would see my point and determine which values to use. When I saw "hardware.memory.buffer" I assumed that value would be the total amount of RAM installed. If it is actually another type of cache, I wouldn't know offhand if that particular chunk of memory is handled the same way that system cache is. I.E. cache is always "available" for use. If the buffer is memory space that is used by applications, it is *not* immediately "available" so it would be incorrect to remove that value from the total used. You can check the actual SNMP oids here: https://github.com/openstack/ceilometer/blob/ffc9ee99c10ede988769907fdb0594a512c890cd/ceilometer/hardware/pollsters/data/snmp.yaml#L76 https://github.com/openstack/ceilometer/blob/ffc9ee99c10ede988769907fdb0594a512c890cd/ceilometer/hardware/pollsters/data/snmp.yaml#L101 https://github.com/openstack/ceilometer/blob/ffc9ee99c10ede988769907fdb0594a512c890cd/ceilometer/hardware/pollsters/data/snmp.yaml#L109 now in the free man page http://man7.org/linux/man-pages/man1/free.1.html used is defined as: used == Used memory (calculated as total - free - buffers - cache) Now I would assume that 1.3.6.1.4.1.2021.4.14.0 is the <Memory used by kernel buffers (Buffers in /proc/meminfo)> but I can't find it in SNMP docs, so not 100% sure @Mainn can you investigate and then change the computation accordingly? https://review.openstack.org/#/c/157257/ describes how used memory is calculated. hardware.memory.used = total memory - total avail (free) memory So as Alex observed, it would include cache memory. We will need to adjust how we calculate memory used in CloudForms. Fix posted for review: https://github.com/ManageIQ/manageiq/pull/14470 Verified ======== 5.9.0.22 |