Bug 1754459

Summary:

The system dashboards show a significantly lower disk and memory usage than the hosts' dashboards

Product:

OpenShift Container Platform

Reporter:

Udi Kalifon <ukalifon>

Component:

Console Metal3 Plugin

Assignee:

Paul Gier <pgier>

Status:

CLOSED NOTABUG

QA Contact:

Udi Kalifon <ukalifon>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

unspecified

CC:

aos-bugs, hpokorny, lcosic

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-10-09 13:32:17 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1759945, 1774653

Bug Blocks:

Attachments:

Description	Flags
Data in the system dashboards and the host dashboard	none

Description Udi Kalifon 2019-09-23 10:15:54 UTC

Created attachment 1618158 [details]
Data in the system dashboards and the host dashboard

Description of problem:
In the system dashboards I see a memory usage of 1-2 GB, and a disk usage of 1-2 GB as well. At the same time, any baremetal host that I check shows 10-20 GB RAM used, and 20-30 GB disk. The aggregated results in the system dashboards can't possibly show less than any single host.


How reproducible:
100%


Steps to Reproduce:
1. Compare how much RAM and disk the baremetal hosts use, with the total usage that you see in the system dashboards


Additional info:
See attached screen shot

Comment 1 Honza Pokorny 2019-10-07 17:05:55 UTC

The two dashboards use different Prometheus exporters:

Overview dashboard: (sum(kube_node_status_capacity_memory_bytes) - sum(kube_node_status_allocatable_memory_bytes))[60m:5m]

Baremetal host dashboard: node_memory_Active_bytes

Comment 2 Honza Pokorny 2019-10-07 22:16:40 UTC

Lily, any ideas what could cause the large discrepancy?

Comment 3 Paul Gier 2019-10-09 09:04:35 UTC

I think the issue is that kube_node_status_allocatable_memory_bytes is the total memory which can be used for PODs (including ones that are currently running).  So this value is determined when the kubelet is first started.  It's the value of the total node memory minus memory which is reserved for the kubelet itself and for system processes (ssh, etc).  So the overview dashboard calculation listed by Honza ends up being:

  (node_capacity - (node_capacity - mem_reserved_for_system_stuff))

The node_capacity values cancel, and the value shown in the dashboard is actually how much memory we're holding for kubelet and system processes.
Some additional explanation is available here: https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

I'm not sure if the disk usage difference is a similar issue.

Comment 4 Lili Cosic 2019-10-09 13:23:32 UTC

I think this can be closed as its an issue in our console and your is actually correct. Thanks for the ping on this!