Bug 1669410 - Memory usage is double counted for `oc adm top pod` command
Summary: Memory usage is double counted for `oc adm top pod` command
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.1.0
Assignee: Frederic Branczyk
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1669718 (view as bug list)
Depends On:
Blocks: 1664187
TreeView+ depends on / blocked
 
Reported: 2019-01-25 07:36 UTC by Junqi Zhao
Modified: 2023-09-14 04:45 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:42:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:42:22 UTC

Description Junqi Zhao 2019-01-25 07:36:54 UTC
Description of problem:
Memory usage is double counted for `oc adm top pod` command

$ oc -n openshift-kube-apiserver adm top pod openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal
NAME                                                               CPU(cores)   MEMORY(bytes)   
openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal   213m         912Mi         

search in prometheus UI
pod_name:container_memory_usage_bytes:sum{namespace="openshift-kube-apiserver",pod_name="openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal"}

Result is 478658560 byte, that is 478658560/1024/1024 = 456.484375Mi
Element	                                                                                    Value
pod_name:container_memory_usage_bytes:sum{namespace="openshift-kube-apiserver",pod_name="openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal"}	478658560


912Mi from `oc adm top pod` command, which is the double value from prometheus


Version-Release number of selected component (if applicable):
$ oc version
oc v4.0.0-0.125.0

payload: registry.svc.ci.openshift.org/ocp/release@sha256:9185e93b4cf65abe8712b2e489226406c3ea9406da8051c8ae201a9159fa3db8


How reproducible:
Always

Steps to Reproduce:
1. Check `oc adm top pod` for Memory usage result
2. Check from prometheus UI for Memory usage result
3. Compare the two results

Actual results:
Memory usage is double counted for `oc adm top pod` command

Expected results:
Both value should not have large gap

Additional info:
Similiar issue: https://github.com/openshift/cluster-monitoring-operator/pull/153/files

Comment 1 Seth Jennings 2019-04-01 18:45:42 UTC
I am getting a discrepancy but in the other direction.

$ oc adm top pod
NAME                                                     CPU(cores)   MEMORY(bytes)   
etcd-member-ip-10-0-130-219.us-west-1.compute.internal   73m          175Mi           
etcd-member-ip-10-0-137-248.us-west-1.compute.internal   51m          222Mi <----          
etcd-member-ip-10-0-152-13.us-west-1.compute.internal    36m          173Mi

Straight out of prometheus:
pod_name:container_memory_usage_bytes:sum{namespace="kube-system",pod_name="etcd-member-ip-10-0-137-248.us-west-1.compute.internal"}	335613952

Sending to Monitoring to take a look at the prometheus adapter that serves up the resource API and figure out why there is such a large delta.

Comment 2 Andrew Pickering 2019-04-02 05:39:10 UTC
Possibly a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1669718

Comment 3 Frederic Branczyk 2019-04-02 13:12:43 UTC
First PR to fix this is out: https://github.com/coreos/prometheus-operator/pull/2528

Comment 4 Andrew Pickering 2019-04-03 00:05:44 UTC
Fix PR has merged.

Comment 5 Frederic Branczyk 2019-04-03 09:19:32 UTC
Actually that was "just" the upstream change. The downstream change necessary is captured in: https://github.com/openshift/cluster-monitoring-operator/pull/303

Comment 7 Frederic Branczyk 2019-04-03 13:51:40 UTC
The patch that enables this in our downstream landed now as well, so this can indeed be QE'd.

Comment 8 Junqi Zhao 2019-04-04 07:37:52 UTC
# oc -n openshift-kube-apiserver adm top pod kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal
NAME                                                       CPU(cores)   MEMORY(bytes)   
kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal   888m         839Mi           


From prometheus UI, search
pod_name:container_memory_usage_bytes:sum{pod_name='kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal',namespace='openshift-kube-apiserver'} 

result
Element	                                                                                                                                                Value
pod_name:container_memory_usage_bytes:sum{namespace="openshift-kube-apiserver",pod_name="kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal"}	970461184

970461184 / 1024 /1024 = 925.50390625Mi

Issue is fixed, the difference between `oc amd top pod` and prometheus result is acceptable.
payload: 4.0.0-0.nightly-2019-04-04-030930

@Frederic
WDYT?

Comment 9 Frederic Branczyk 2019-04-04 07:50:17 UTC
Could you double check that against the `container_memory_working_set_bytes` metric instead of `pod_name:container_memory_usage_bytes:sum`, as that's what's really used by the adapter.

Comment 10 Junqi Zhao 2019-04-04 08:31:59 UTC
(In reply to Frederic Branczyk from comment #9)
> Could you double check that against the `container_memory_working_set_bytes`
> metric instead of `pod_name:container_memory_usage_bytes:sum`, as that's
> what's really used by the adapter.

The results are almost the same

# oc -n openshift-kube-apiserver adm top pod kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal
NAME                                                       CPU(cores)   MEMORY(bytes)   
kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal   309m         871Mi    

sum(container_memory_working_set_bytes{pod_name='kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal',namespace='openshift-kube-apiserver'}) / 1024 /1024 = 	871.265625Mi

Comment 11 Frederic Branczyk 2019-04-04 08:33:25 UTC
Wonderful, looks solved to me :)

Comment 12 Junqi Zhao 2019-04-04 09:12:49 UTC
*** Bug 1669718 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2019-06-04 10:42:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 18 Red Hat Bugzilla 2023-09-14 04:45:37 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.