Bug 1669410
Summary: | Memory usage is double counted for `oc adm top pod` command | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> |
Component: | Monitoring | Assignee: | Frederic Branczyk <fbranczy> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | adeshpan, anpicker, aos-bugs, erooth, fbranczy, jokerman, mloibl, mmccomas, sponnaga, ssadhale, surbania |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:42:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1664187 |
Description
Junqi Zhao
2019-01-25 07:36:54 UTC
I am getting a discrepancy but in the other direction. $ oc adm top pod NAME CPU(cores) MEMORY(bytes) etcd-member-ip-10-0-130-219.us-west-1.compute.internal 73m 175Mi etcd-member-ip-10-0-137-248.us-west-1.compute.internal 51m 222Mi <---- etcd-member-ip-10-0-152-13.us-west-1.compute.internal 36m 173Mi Straight out of prometheus: pod_name:container_memory_usage_bytes:sum{namespace="kube-system",pod_name="etcd-member-ip-10-0-137-248.us-west-1.compute.internal"} 335613952 Sending to Monitoring to take a look at the prometheus adapter that serves up the resource API and figure out why there is such a large delta. Possibly a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1669718 First PR to fix this is out: https://github.com/coreos/prometheus-operator/pull/2528 Fix PR has merged. Actually that was "just" the upstream change. The downstream change necessary is captured in: https://github.com/openshift/cluster-monitoring-operator/pull/303 The patch that enables this in our downstream landed now as well, so this can indeed be QE'd. # oc -n openshift-kube-apiserver adm top pod kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal NAME CPU(cores) MEMORY(bytes) kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal 888m 839Mi From prometheus UI, search pod_name:container_memory_usage_bytes:sum{pod_name='kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal',namespace='openshift-kube-apiserver'} result Element Value pod_name:container_memory_usage_bytes:sum{namespace="openshift-kube-apiserver",pod_name="kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal"} 970461184 970461184 / 1024 /1024 = 925.50390625Mi Issue is fixed, the difference between `oc amd top pod` and prometheus result is acceptable. payload: 4.0.0-0.nightly-2019-04-04-030930 @Frederic WDYT? Could you double check that against the `container_memory_working_set_bytes` metric instead of `pod_name:container_memory_usage_bytes:sum`, as that's what's really used by the adapter. (In reply to Frederic Branczyk from comment #9) > Could you double check that against the `container_memory_working_set_bytes` > metric instead of `pod_name:container_memory_usage_bytes:sum`, as that's > what's really used by the adapter. The results are almost the same # oc -n openshift-kube-apiserver adm top pod kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal NAME CPU(cores) MEMORY(bytes) kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal 309m 871Mi sum(container_memory_working_set_bytes{pod_name='kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal',namespace='openshift-kube-apiserver'}) / 1024 /1024 = 871.265625Mi Wonderful, looks solved to me :) *** Bug 1669718 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |