Bug 1754459

Summary: The system dashboards show a significantly lower disk and memory usage than the hosts' dashboards
Product: OpenShift Container Platform Reporter: Udi Kalifon <ukalifon>
Component: Console Metal3 PluginAssignee: Paul Gier <pgier>
Status: CLOSED NOTABUG QA Contact: Udi Kalifon <ukalifon>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: aos-bugs, hpokorny, lcosic
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-09 13:32:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1759945, 1774653    
Bug Blocks:    
Attachments:
Description Flags
Data in the system dashboards and the host dashboard none

Description Udi Kalifon 2019-09-23 10:15:54 UTC
Created attachment 1618158 [details]
Data in the system dashboards and the host dashboard

Description of problem:
In the system dashboards I see a memory usage of 1-2 GB, and a disk usage of 1-2 GB as well. At the same time, any baremetal host that I check shows 10-20 GB RAM used, and 20-30 GB disk. The aggregated results in the system dashboards can't possibly show less than any single host.


How reproducible:
100%


Steps to Reproduce:
1. Compare how much RAM and disk the baremetal hosts use, with the total usage that you see in the system dashboards


Additional info:
See attached screen shot

Comment 1 Honza Pokorny 2019-10-07 17:05:55 UTC
The two dashboards use different Prometheus exporters:

Overview dashboard: (sum(kube_node_status_capacity_memory_bytes) - sum(kube_node_status_allocatable_memory_bytes))[60m:5m]

Baremetal host dashboard: node_memory_Active_bytes

Comment 2 Honza Pokorny 2019-10-07 22:16:40 UTC
Lily, any ideas what could cause the large discrepancy?

Comment 3 Paul Gier 2019-10-09 09:04:35 UTC
I think the issue is that kube_node_status_allocatable_memory_bytes is the total memory which can be used for PODs (including ones that are currently running).  So this value is determined when the kubelet is first started.  It's the value of the total node memory minus memory which is reserved for the kubelet itself and for system processes (ssh, etc).  So the overview dashboard calculation listed by Honza ends up being:

  (node_capacity - (node_capacity - mem_reserved_for_system_stuff))

The node_capacity values cancel, and the value shown in the dashboard is actually how much memory we're holding for kubelet and system processes.
Some additional explanation is available here: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

I'm not sure if the disk usage difference is a similar issue.

Comment 4 Lili Cosic 2019-10-09 13:23:32 UTC
I think this can be closed as its an issue in our console and your is actually correct. Thanks for the ping on this!