1739588 – [Grafana] Displaying confusing "Cache" memory usage

Bug 1739588 - [Grafana] Displaying confusing "Cache" memory usage

Summary: [Grafana] Displaying confusing "Cache" memory usage

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Pawel Krupa
QA Contact:	Viacheslav
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-09 15:09 UTC by Will Gordon
Modified:	2019-08-20 09:53 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-20 09:53:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 442	0	None	closed	Bug 1733830: Bump kubernetes-mixin	2021-02-18 05:06:25 UTC

Description Will Gordon 2019-08-09 15:09:28 UTC

Description of problem:
When viewing Memory Usage in Grafana, the "Memory Usage" column includes RSS and Cache memory, with no explanation what these are. This causes confusion to the customer, and since Cache is not actual memory usage, it should not be displayed.

Version-Release number of selected component (if applicable):
4.1.z

How reproducible:
Always

Steps to Reproduce:
1. Login to Grafana
2. View the "Kubernetes / Compute Resources / Pod" dashboard
3. Select something like "openshift-monitoring" for the namespace, and "prometheus-k8s-0" for the pod
4. Scroll down to the Memory Usage section

Actual results:
"Memory Usage" is displayed as a sum of "Memory Usage (RSS)" and "Memory Usage (Cache)".

Expected results:
There shouldn't be a "Memory Usage (RSS)" column and "Memory Usage (Cache)" column. There should only be 1 column, "Memory Usage", that is displaying data from "Memory Usage (RSS)".

Additional info:
This is very misleading to customers and is likely to cause support tickets from customers concerned about OpenShift components running on their cluster with a large memory footprint.

Comment 1 Pawel Krupa 2019-08-12 13:40:21 UTC

Counting "used" memory in linux systems is not something that is simple and could be presented on one graph as there are couple of types of "used" memory. Additionally container metrics coming from cgroups make this problem even harder. We are currently unifying how CPU and memory metrics are presented across the whole stack and decided to leave RSS and Cache information in the table as those are quite useful. However we will change the source of data reponsible for "Memory Usage" column to use the same data as k8s scheduler which should provide better insight. Bear in mind that this metric is not RSS, but WSS (Working Set Size).

Comment 2 Pawel Krupa 2019-08-15 14:39:23 UTC

We had a larger discussion about this problem in https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/227 and decided to use WSS for overall memory usage. More on why can be found in this comment https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/227#issuecomment-520835340

Comment 4 Viacheslav 2019-08-20 09:31:41 UTC

Fix it pls.

Comment 5 Pawel Krupa 2019-08-20 09:53:45 UTC

We did exactly as I said in one of my previous comments, by using WSS for "Memory Usage". We are NOT going to reduce number of columns or change to use RSS for overall memory usage. For reasons why please read discussion from https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/227.

I am closing this as WONTFIX as it won't satisfy initial request.

Note You need to log in before you can comment on or make changes to this bug.