Bug 1669410
| Summary: | Memory usage is double counted for `oc adm top pod` command | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> |
| Component: | Monitoring | Assignee: | Frederic Branczyk <fbranczy> |
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.1.0 | CC: | adeshpan, anpicker, aos-bugs, erooth, fbranczy, jokerman, mloibl, mmccomas, sponnaga, ssadhale, surbania |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:42:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1664187 | ||
I am getting a discrepancy but in the other direction.
$ oc adm top pod
NAME CPU(cores) MEMORY(bytes)
etcd-member-ip-10-0-130-219.us-west-1.compute.internal 73m 175Mi
etcd-member-ip-10-0-137-248.us-west-1.compute.internal 51m 222Mi <----
etcd-member-ip-10-0-152-13.us-west-1.compute.internal 36m 173Mi
Straight out of prometheus:
pod_name:container_memory_usage_bytes:sum{namespace="kube-system",pod_name="etcd-member-ip-10-0-137-248.us-west-1.compute.internal"} 335613952
Sending to Monitoring to take a look at the prometheus adapter that serves up the resource API and figure out why there is such a large delta.
Possibly a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1669718 First PR to fix this is out: https://github.com/coreos/prometheus-operator/pull/2528 Fix PR has merged. Actually that was "just" the upstream change. The downstream change necessary is captured in: https://github.com/openshift/cluster-monitoring-operator/pull/303 The patch that enables this in our downstream landed now as well, so this can indeed be QE'd. # oc -n openshift-kube-apiserver adm top pod kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal
NAME CPU(cores) MEMORY(bytes)
kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal 888m 839Mi
From prometheus UI, search
pod_name:container_memory_usage_bytes:sum{pod_name='kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal',namespace='openshift-kube-apiserver'}
result
Element Value
pod_name:container_memory_usage_bytes:sum{namespace="openshift-kube-apiserver",pod_name="kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal"} 970461184
970461184 / 1024 /1024 = 925.50390625Mi
Issue is fixed, the difference between `oc amd top pod` and prometheus result is acceptable.
payload: 4.0.0-0.nightly-2019-04-04-030930
@Frederic
WDYT?
Could you double check that against the `container_memory_working_set_bytes` metric instead of `pod_name:container_memory_usage_bytes:sum`, as that's what's really used by the adapter. (In reply to Frederic Branczyk from comment #9) > Could you double check that against the `container_memory_working_set_bytes` > metric instead of `pod_name:container_memory_usage_bytes:sum`, as that's > what's really used by the adapter. The results are almost the same # oc -n openshift-kube-apiserver adm top pod kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal NAME CPU(cores) MEMORY(bytes) kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal 309m 871Mi sum(container_memory_working_set_bytes{pod_name='kube-apiserver-ip-10-0-129-66.sa-east-1.compute.internal',namespace='openshift-kube-apiserver'}) / 1024 /1024 = 871.265625Mi Wonderful, looks solved to me :) *** Bug 1669718 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Description of problem: Memory usage is double counted for `oc adm top pod` command $ oc -n openshift-kube-apiserver adm top pod openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal NAME CPU(cores) MEMORY(bytes) openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal 213m 912Mi search in prometheus UI pod_name:container_memory_usage_bytes:sum{namespace="openshift-kube-apiserver",pod_name="openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal"} Result is 478658560 byte, that is 478658560/1024/1024 = 456.484375Mi Element Value pod_name:container_memory_usage_bytes:sum{namespace="openshift-kube-apiserver",pod_name="openshift-kube-apiserver-ip-10-0-43-9.us-east-2.compute.internal"} 478658560 912Mi from `oc adm top pod` command, which is the double value from prometheus Version-Release number of selected component (if applicable): $ oc version oc v4.0.0-0.125.0 payload: registry.svc.ci.openshift.org/ocp/release@sha256:9185e93b4cf65abe8712b2e489226406c3ea9406da8051c8ae201a9159fa3db8 How reproducible: Always Steps to Reproduce: 1. Check `oc adm top pod` for Memory usage result 2. Check from prometheus UI for Memory usage result 3. Compare the two results Actual results: Memory usage is double counted for `oc adm top pod` command Expected results: Both value should not have large gap Additional info: Similiar issue: https://github.com/openshift/cluster-monitoring-operator/pull/153/files