Bug 1656868 - Large difference between manually calculated Memory related values and Prometheus query output
Summary: Large difference between manually calculated Memory related values and Promet...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.11.0
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Frederic Branczyk
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1701856
TreeView+ depends on / blocked
 
Reported: 2018-12-06 14:36 UTC by Shivkumar Ople
Modified: 2019-06-04 10:41 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1701856 (view as bug list)
Environment:
Last Closed: 2019-06-04 10:41:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:41:20 UTC
Red Hat Knowledge Base (Solution) 3732681 Performance tune None Node metrics values and Prometheus values are observed as different in OCP 3.11 2019-05-28 16:24:06 UTC

Description Shivkumar Ople 2018-12-06 14:36:29 UTC
Description of problem:

Manually calculated available or free values from free command does not match with prometheus sum(:node_memory_MemFreeCachedBuffers:sum) value.


The output of prometheus sum(:node_memory_MemFreeCachedBuffers:sum) result, prometheus returns: 228516712448

And

#free -b | grep Mem command across the whole cluster (3 master nodes, 3 infra nodes, 6 application nodes) returns:
	total	used	free	shared	buff/cache	available
Mem:	33567514624	4655554560	228970496	5259264	28682989568 27253944320
Mem:	33567469568	3680813056	4468191232	6115328	25418465280	29052235776
Mem:	33567469568	7705444352	14346620928	7659520	11515404288	25235783680
Mem:	33567469568	4605882368	15638114304	6225920	13323472896	28376805376
Mem:	33567469568	9781415936	1770549248	550543360	22015504384	22365310976
Mem:	33567469568	16412160000	393760768	11522048	16761548800	16270479360
Mem:	33567469568	5052407808	407941120	5537792	28107120640	26687836160
Mem:	33567514624	3753984000	874369024	6721536	28939161600	28735127552
Mem:	33567514624	4327591936	792666112	7442432	28447256576	28191858688
Mem:	33567514624	4794445824	910139392	5914624	27862929408	27696361472
Mem:	33567514624	6027689984	679137280	3465216	26860687360	25744715776
Mem:	33567506432	3273830400	487788544	4468736	29805887488	29746855936
						
SUM:	402809896960	74071220224	40998248448	620875776	287740428288	315357315072

Available or free values from free command do not match prometheus sum(:node_memory_MemFreeCachedBuffers:sum) value


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Execute sum(:node_memory_MemFreeCachedBuffers:sum) query on prometheus and record the output
2.calculate free and available (i.e. necessary Buff/Cache) values from each node
3. Compare the output from step 1 and 2 

Actual results:
a large difference between both values


Expected results:
Should not have a large difference 


Additional info:

Comment 1 Shivkumar Ople 2018-12-06 14:51:01 UTC
Continuing to the "Description of problem" section from the previous comment following is the difference between the output:


===> Difference between sum(:node_memory_MemFreeCachedBuffers:sum) and Mem free value (from free command) is +95 GB

===> ifference between sum(:node_memory_MemFreeCachedBuffers:sum) and Mem available value (from free command) is -66GB

Comment 2 Junqi Zhao 2018-12-07 07:01:29 UTC
(In reply to Shivkumar Ople from comment #1)
> Continuing to the "Description of problem" section from the previous comment
> following is the difference between the output:
> 
> 
> ===> Difference between sum(:node_memory_MemFreeCachedBuffers:sum) and Mem
> free value (from free command) is +95 GB
> 
> ===> ifference between sum(:node_memory_MemFreeCachedBuffers:sum) and Mem
> available value (from free command) is -66GB

What is the exact difference between sum(:node_memory_MemFreeCachedBuffers:sum) and Mem free value (from free command), 95GB or -66GB

Comment 3 Shivkumar Ople 2018-12-07 09:31:13 UTC
The exact difference between sum(:node_memory_MemFreeCachedBuffers:sum) and Mem free value (from free command) is 95 GB

Comment 7 minden 2019-01-30 07:56:15 UTC
Hi there,

I am sorry for net getting back to this any earlier. I am having difficulties to reproduce this.

> Available or free values from free command do not match prometheus sum(:node_memory_MemFreeCachedBuffers:sum) value

"sum(:node_memory_MemFreeCachedBuffers:sum)" is the sum of /free/, /cached/ and /buffered/ memory from the free command. Thereby it does not match the /free/ value of the free command. You can find the recording rule definition here [1].

The /available/ value of the free command is not just the sum of /free/ and /cached/, but includes additional considerations. See excerpt of `man free`:

>        available
>              Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by
>              the  cache or free fields, this field takes into account page cache and also that not all reclaimable memory slabs will
>              be reclaimed due to items being in use (MemAvailable in /proc/meminfo, available on kernels 3.14, emulated  on  kernels
>              2.6.27+, otherwise the same as free)


In order to be able to better debug this, would you mind posting an updated output of `$ free` here and the output of the following node_exporter metrics:

- node_memory_MemFree_bytes{job="node-exporter"}
- node_memory_Cached_bytes{job="node-exporter"}
- node_memory_Buffers_bytes{job="node-exporter"}
- node_memory_MemTotal_bytes{job="node-exporter"}

Thanks for the help.


[1] https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/manifests/prometheus-rules.yaml#L159

Comment 19 Frederic Branczyk 2019-02-26 17:11:51 UTC
PR opened to consistently configure the metrics used in 4.0 in https://github.com/coreos/prometheus-operator/pull/2438.

Unfortunately this is impossible to backport to 3.11 as in 4.0 we introduced an entirely new component, that uses metrics collected by Prometheus to serve the Kubernetes resource metrics API, instead of the Kubernetes metrics-server, which only aggregates the cgroup hierachy, which causes this inaccuracy reported here, as there are other processes not in the cgroup hierarchy that use CPU/memory/etc.

Comment 20 Frederic Branczyk 2019-02-28 07:45:25 UTC
https://github.com/openshift/cluster-monitoring-operator/pull/272 merged, which pulled in the changes from https://github.com/coreos/prometheus-operator/pull/2438. The Grafana dashboards and "kubectl top" now use the identical metrics, so there should be no more deviation. Also the metrics used for `kubectl top node` now come from the node-exporter as opposed to only being the sum of resources used by all containers. Moving to modified.

Comment 27 errata-xmlrpc 2019-06-04 10:41:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.