Bug 1962261 - Monitoring components requesting more memory than they use
Summary: Monitoring components requesting more memory than they use
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Filip Petkovski
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1962305 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-19 15:46 UTC by Filip Petkovski
Modified: 2021-07-27 23:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:09:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1158 0 None closed Bug 1962261: jsonnet: consolidate memory requests for all resources 2021-05-27 09:36:43 UTC
Github openshift cluster-monitoring-operator pull 1172 0 None closed Bug 1962261: further adjust memory usage 2021-05-27 09:36:41 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:09:36 UTC

Comment 2 Junqi Zhao 2021-05-20 07:23:48 UTC
From the attached picture and tested in our cluster, need to change resources.requests.memory to a lower value for thanos-sidecar
Container Name: thanos-sidecar
resources: map[requests:map[cpu:1m memory:100Mi]]

(max (kube_pod_container_resource_requests{resource="memory", namespace="openshift-monitoring",container="thanos-sidecar"})  - max (container_memory_usage_bytes{namespace="openshift-monitoring",container="thanos-sidecar"})) /1024 /1024 
{}
69.3671875

Comment 3 Filip Petkovski 2021-05-20 08:00:11 UTC
*** Bug 1962305 has been marked as a duplicate of this bug. ***

Comment 4 Filip Petkovski 2021-05-20 08:02:25 UTC
I have made further adjustments in a new PR: https://github.com/openshift/cluster-monitoring-operator/pull/1172

Comment 6 Junqi Zhao 2021-05-24 03:19:06 UTC
tested with 4.8.0-0.nightly-2021-05-21-233425, need to change resources.requests.memory to a bigger value for prometheus-operator container, please change back to ON_QA if this also fine
******************************** 
searched
sort(
  max by (container) (container_memory_usage_bytes{namespace="openshift-monitoring"} or on(container) container_memory_rss{namespace="openshift-monitoring"}) -
  max by (container) (kube_pod_container_resource_requests{resource="memory", namespace="openshift-monitoring"})) / 1024 /1024
result
{container="prometheus-operator"}   -27.0703125
{container="telemeter-client"}   -14.890625
{container="alertmanager"}   -13.015625
{container="cluster-monitoring-operator"}   -12.5859375
{container="openshift-state-metrics"}   -12.3671875
{container="grafana"}   -11.5
{container="node-exporter"}   -5.5078125
{container="kube-state-metrics"}   -4.28125
{container="kube-rbac-proxy-self"}   -2.484375
{container="prom-label-proxy"}   -1.14453125
{container="kube-rbac-proxy"}   -0.18359375
{container="kube-rbac-proxy-main"}   1.30078125
{container="thanos-sidecar"}   3.62109375
{container="kube-rbac-proxy-rules"}   3.703125
{container="prometheus-proxy"}   5.26953125
{container="oauth-proxy"}   6.1015625
{container="reload"}   7.16015625
{container="alertmanager-proxy"}   7.78515625
{container="kube-rbac-proxy-thanos"}   8.29296875
{container="prometheus-adapter"}   9.58203125
{container="grafana-proxy"}   9.8125
{container="config-reloader"}   10.67578125
{container="thanos-query"}   60.05859375
{container="prometheus"}   1112.93359375
********************************

Comment 7 Junqi Zhao 2021-05-24 03:19:35 UTC
# for i in $(kubectl -n openshift-monitoring get pod --no-headers | awk '{print $1}'); do echo $i; kubectl -n openshift-monitoring get pod $i -o go-template='{{range.spec.containers}}{{"Container Name: "}}{{.name}}{{"\r\nresources: "}}{{.resources}}{{"\n"}}{{end}}'; echo -e "\n"; done
alertmanager-main-0
Container Name: alertmanager
resources: map[requests:map[cpu:4m memory:40Mi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: alertmanager-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]


alertmanager-main-1
Container Name: alertmanager
resources: map[requests:map[cpu:4m memory:40Mi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: alertmanager-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]


alertmanager-main-2
Container Name: alertmanager
resources: map[requests:map[cpu:4m memory:40Mi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: alertmanager-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]


cluster-monitoring-operator-fdb9d949c-vkl5q
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: cluster-monitoring-operator
resources: map[requests:map[cpu:10m memory:75Mi]]


grafana-7bb7f88d68-7ks6f
Container Name: grafana
resources: map[requests:map[cpu:4m memory:64Mi]]
Container Name: grafana-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]


kube-state-metrics-69cc98557f-stb24
Container Name: kube-state-metrics
resources: map[requests:map[cpu:2m memory:80Mi]]
Container Name: kube-rbac-proxy-main
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-self
resources: map[requests:map[cpu:1m memory:15Mi]]


node-exporter-2v86g
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]


node-exporter-427b8
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]


node-exporter-5whz5
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]


node-exporter-9r2bz
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]


node-exporter-khtd6
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]


node-exporter-psxqr
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]


openshift-state-metrics-5f54b4ff58-w674d
Container Name: kube-rbac-proxy-main
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy-self
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: openshift-state-metrics
resources: map[requests:map[cpu:1m memory:32Mi]]


prometheus-adapter-6cb7687895-9bfn8
Container Name: prometheus-adapter
resources: map[requests:map[cpu:1m memory:40Mi]]


prometheus-adapter-6cb7687895-sppwt
Container Name: prometheus-adapter
resources: map[requests:map[cpu:1m memory:40Mi]]


prometheus-k8s-0
Container Name: prometheus
resources: map[requests:map[cpu:70m memory:1Gi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: thanos-sidecar
resources: map[requests:map[cpu:1m memory:25Mi]]
Container Name: prometheus-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-thanos
resources: map[requests:map[cpu:1m memory:10Mi]]


prometheus-k8s-1
Container Name: prometheus
resources: map[requests:map[cpu:70m memory:1Gi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: thanos-sidecar
resources: map[requests:map[cpu:1m memory:25Mi]]
Container Name: prometheus-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-thanos
resources: map[requests:map[cpu:1m memory:10Mi]]


prometheus-operator-fd77ffdd8-6brvp
Container Name: prometheus-operator
resources: map[requests:map[cpu:5m memory:150Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]


telemeter-client-5657ccddfb-74fhr
Container Name: telemeter-client
resources: map[requests:map[cpu:1m memory:40Mi]]
Container Name: reload
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]


thanos-querier-db74d4959-74v2r
Container Name: thanos-query
resources: map[requests:map[cpu:10m memory:12Mi]]
Container Name: oauth-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-rules
resources: map[requests:map[cpu:1m memory:15Mi]]


thanos-querier-db74d4959-lbqbv
Container Name: thanos-query
resources: map[requests:map[cpu:10m memory:12Mi]]
Container Name: oauth-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-rules
resources: map[requests:map[cpu:1m memory:15Mi]]

Comment 8 Filip Petkovski 2021-05-25 06:29:09 UTC
Hi Junqi, 

the way to calculate the correct memory request is actually documented here: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits.

The guidelines say that the requested memory should be 10% higher than the 90th percentile of memory usage during a CI run.
I went ahead and made a PR with the actual query which would calculate the discrepancy: https://github.com/openshift/enhancements/pull/788/files

When I ran the query after adjustments, the difference between requested and used memory was within 20Mi ranges. 
Do you think you can do your verification with this query as well?

Comment 12 errata-xmlrpc 2021-07-27 23:09:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.