Description of problem: When using Dashboard Kubernetes/Compute Resources/Cluster Memory Utilization shows negative percentage and incorrect usage Version-Release number of selected component (if applicable): Server Version: 4.8.0-0.nightly-s390x-2021-03-22-155743 How reproducible: This is a large environment consisting of 3 masters and 10 workers Looking at the inspect values: 1 - sum(:node_memory_MemAvailable_bytes:sum{cluster=""}) / sum(kube_node_status_allocatable_memory_bytes{cluster=""}) sum of node_memory_MemAvailable_bytes = 611205861376 sum of Kub_node_status_allocatable_memory_bytes = 387137921024 I messed around with some other variables to get to a proper value. I have included my screen shots and trials with and without a mem workload running to see how the current Steps to Reproduce: 1. Compute mem utilization on an environment to see if you get the correct value and it matches what is displayed in the Dashboard. 2. 3. Actual results: Expected results: correct utilization value Additional info:
Created attachment 1766360 [details] Screen Shots of Dashboard and usage by nodes
I see that this query has now been changed to `1 - sum(:node_memory_MemAvailable_bytes:sum{cluster=""}) / sum(kube_node_status_allocatable{resource="memory",cluster=""})` (changed by https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/534). Not sure if this change would be expected to resolve this issue. Pawel, could you confirm? FWIW, I am not seeing negative values with my test cluster.
Seems like this can be happening when there is a large chunk of memory reserved for other uses. In such scenario node available memory will be much higher than what is allowed to be allocated by scheduler. This leads to have higher than one right part of the equation (`sum(:node_memory_MemAvailable_bytes:sum{cluster=""}) / sum(kube_node_status_allocatable{resource="memory",cluster=""})`) and causes negative values in overall. The PR https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/534 won't fix this as we need a different way to track this, preferably one where we don't subtract metric values from 1.
checked with 4.9.0-0.nightly-2021-08-04-131508, Dashboard Kubernetes/Compute Resources/Cluster, "Memory Utilisation" expression now is 1 - sum(:node_memory_MemAvailable_bytes:sum{cluster=""}) / sum(node_memory_MemTotal_bytes{cluster=""}) this can guarantee no negative value
@juzhao I was able to validate the fix on 4.9. This defect can be closed. Thank you!