Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1633812

Summary: Memory Metrics reported on Pods not consistent
Product: OpenShift Container Platform Reporter: Bruno Andrade <bandrade>
Component: HawkularAssignee: Ruben Vargas Palma <rvargasp>
Status: CLOSED DEFERRED QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1770143 (view as bug list) Environment:
Last Closed: 2019-11-20 18:56:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1770143    
Attachments:
Description Flags
Metrics From UI none

Description Bruno Andrade 2018-09-27 19:28:45 UTC
Description of problem:

I'm trying to find out what memory my pod is consuming and I see different values from the OCP GUI, oc adm top pods and inside the pod.
What is the correct value to use and why are these differently reported?


For the OCP GUI see attached snippet - 3.3 GiB
oc adm top pod  3000M
[qcambre@ocplogs ~]$ oc adm top pods
NAME            CPU(cores)   MEMORY(bytes)
websc-1-qchhj   1431m        3011Mi

And inside the POD.
sh-4.2$ cat /sys/fs/cgroup/memory/memory.stat
cache 838934528
rss 2664419328
rss_huge 2554331136
mapped_file 26480640
swap 0
pgpgin 7097370
pgpgout 6864457
pgfault 618247
pgmajfault 674
inactive_anon 0
active_anon 2664353792
inactive_file 355774464
active_file 483160064
unevictable 0
hierarchical_memory_limit 50460135424
hierarchical_memsw_limit 9223372036854771712
total_cache 838934528
total_rss 2664419328
total_rss_huge 2554331136
total_mapped_file 26480640
total_swap 0
total_pgpgin 7097370
total_pgpgout 6864457
total_pgfault 618247
total_pgmajfault 674
total_inactive_anon 0
total_active_anon 2664353792
total_inactive_file 355774464
total_active_file 483160064
total_unevictable 0

Can you please explain the differences? And what should we use to monitor the memory of pods properly?

Some research:
This issue explains in detail how metrics are calculated on the webconsole :
https://github.com/openshift/origin-web-console/issues/1315

Here is script reproduces what kubelet/cAdvisor does[1], you can consider memory_usage_in_bytes and convert to megabytes.

[1]

#!/bin/bash
#!/usr/bin/env bash

# This script reproduces what kubelet/cAdvisor does
# to calculate memory.available relative to root cgroup.
# The major change is that it excludes total_inactive_file memory.

# current memory usage
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
memory_total_inactive_file=$(cat /sys/fs/cgroup/memory/memory.stat | grep total_inactive_file | awk '{print $2}')

memory_working_set=$memory_usage_in_bytes
if [ "$memory_working_set" -lt "$memory_total_inactive_file" ];
then
    memory_working_set=0
else
    memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
fi

memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))

echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"

~
Regards.

This can be related with https://bugzilla.redhat.com/show_bug.cgi?id=1431667.

Version-Release number of selected component (if applicable):
OCP 3.9

Comment 1 Bruno Andrade 2018-09-27 19:33:10 UTC
Created attachment 1487880 [details]
Metrics From UI

Comment 2 Frederic Branczyk 2018-09-28 09:45:27 UTC
This is hawkular related, moving to there.

Comment 3 John Sanda 2018-09-28 13:46:00 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1600871#c4.

Comment 7 Stephen Cuppett 2019-11-20 18:56:46 UTC
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift