Bug 1739588 - [Grafana] Displaying confusing "Cache" memory usage
Summary: [Grafana] Displaying confusing "Cache" memory usage
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.0
Assignee: Pawel Krupa
QA Contact: Viacheslav
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-09 15:09 UTC by Will Gordon
Modified: 2019-08-20 09:53 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-20 09:53:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 442 0 None closed Bug 1733830: Bump kubernetes-mixin 2021-02-18 05:06:25 UTC

Description Will Gordon 2019-08-09 15:09:28 UTC
Description of problem:
When viewing Memory Usage in Grafana, the "Memory Usage" column includes RSS and Cache memory, with no explanation what these are. This causes confusion to the customer, and since Cache is not actual memory usage, it should not be displayed.

Version-Release number of selected component (if applicable):
4.1.z

How reproducible:
Always

Steps to Reproduce:
1. Login to Grafana
2. View the "Kubernetes / Compute Resources / Pod" dashboard
3. Select something like "openshift-monitoring" for the namespace, and "prometheus-k8s-0" for the pod
4. Scroll down to the Memory Usage section

Actual results:
"Memory Usage" is displayed as a sum of "Memory Usage (RSS)" and "Memory Usage (Cache)".

Expected results:
There shouldn't be a "Memory Usage (RSS)" column and "Memory Usage (Cache)" column. There should only be 1 column, "Memory Usage", that is displaying data from "Memory Usage (RSS)".

Additional info:
This is very misleading to customers and is likely to cause support tickets from customers concerned about OpenShift components running on their cluster with a large memory footprint.

Comment 1 Pawel Krupa 2019-08-12 13:40:21 UTC
Counting "used" memory in linux systems is not something that is simple and could be presented on one graph as there are couple of types of "used" memory. Additionally container metrics coming from cgroups make this problem even harder. We are currently unifying how CPU and memory metrics are presented across the whole stack and decided to leave RSS and Cache information in the table as those are quite useful. However we will change the source of data reponsible for "Memory Usage" column to use the same data as k8s scheduler which should provide better insight. Bear in mind that this metric is not RSS, but WSS (Working Set Size).

Comment 2 Pawel Krupa 2019-08-15 14:39:23 UTC
We had a larger discussion about this problem in https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/227 and decided to use WSS for overall memory usage. More on why can be found in this comment https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/227#issuecomment-520835340

Comment 4 Viacheslav 2019-08-20 09:31:41 UTC
Fix it pls.

Comment 5 Pawel Krupa 2019-08-20 09:53:45 UTC
We did exactly as I said in one of my previous comments, by using WSS for "Memory Usage". We are NOT going to reduce number of columns or change to use RSS for overall memory usage. For reasons why please read discussion from https://github.com/kubernetes-monitoring/kubernetes-mixin/issues/227.

I am closing this as WONTFIX as it won't satisfy initial request.


Note You need to log in before you can comment on or make changes to this bug.