Bug 1804455

Summary: openshift-monitoring memory stats don't match output of 'free' command on a node
Product: OpenShift Container Platform Reporter: Luke Stanton <lstanton>
Component: MonitoringAssignee: Pawel Krupa <pkrupa>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: adeshpan, alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania, ychoukse
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-20 00:12:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luke Stanton 2020-02-18 21:28:28 UTC
Description of problem:

When comparing the output of some memory related Prometheus rules for a node against the 'free' command buff/cache value on the same node, the numbers don't match. For example:

#--- 'free' output ---#
$ free
            total      used       free   shared   buff/cache   available
Mem:     32764040   6135268    1233868     3704     25394904    25970860
Swap:     4194300    269492    3924808

#--- Prometheus buff/cache output ---#
(node_memory_Cached{job="node-exporter", instance="xx.xx.xx.xx:9100"} + node_memory_Buffers{job="node-exporter", instance="xx.xx.xx.xx:9100"})/1024

--> value returned is: 4483656

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'free' buff/cache output: 25394904
       Prometheus output: 4483656
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

How reproducible: Appears to be consistent


Steps to Reproduce:
Compare the above Prometheus rule output with the 'free' buff/cache value on the same node.


Actual results: Values don't match.


Expected results: Values would match or be very close.

Comment 6 Pawel Krupa 2020-03-02 12:29:35 UTC
*** Bug 1809092 has been marked as a duplicate of this bug. ***

Comment 9 Junqi Zhao 2020-03-09 10:47:54 UTC
Tested with cluster-monitoring-operator-v3.11.187-3, this is not much difference between buff/cache and node_memory_Cached+node_memory_Buffers+node_memory_SReclaimable
# free -b
              total        used        free      shared  buff/cache   available
Mem:     3973369856  2115792896   120659968     4231168  1736916992  1576497152
Swap:             0           0           0

(1736916992 - (node_memory_Cached{instance="10.0.150.172:9100"}) - (node_memory_Buffers{instance="10.0.150.172:9100"}) - (node_memory_SReclaimable{instance="10.0.150.172:9100"})) / 1024 /1024
Element 	Value
{endpoint="https",instance="10.0.150.172:9100",job="node-exporter",namespace="openshift-monitoring",pod="node-exporter-lxq64",service="node-exporter"}	2.12890625

Comment 11 errata-xmlrpc 2020-03-20 00:12:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0793