Bug 1962261
| Summary: | Monitoring components requesting more memory than they use | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Filip Petkovski <fpetkovs> |
| Component: | Monitoring | Assignee: | Filip Petkovski <fpetkovs> |
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.8 | CC: | alegrand, anpicker, aos-bugs, dgrisonn, erooth, kakkoyun, pkrupa |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 23:09:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** Bug 1962305 has been marked as a duplicate of this bug. *** I have made further adjustments in a new PR: https://github.com/openshift/cluster-monitoring-operator/pull/1172 tested with 4.8.0-0.nightly-2021-05-21-233425, need to change resources.requests.memory to a bigger value for prometheus-operator container, please change back to ON_QA if this also fine
********************************
searched
sort(
max by (container) (container_memory_usage_bytes{namespace="openshift-monitoring"} or on(container) container_memory_rss{namespace="openshift-monitoring"}) -
max by (container) (kube_pod_container_resource_requests{resource="memory", namespace="openshift-monitoring"})) / 1024 /1024
result
{container="prometheus-operator"} -27.0703125
{container="telemeter-client"} -14.890625
{container="alertmanager"} -13.015625
{container="cluster-monitoring-operator"} -12.5859375
{container="openshift-state-metrics"} -12.3671875
{container="grafana"} -11.5
{container="node-exporter"} -5.5078125
{container="kube-state-metrics"} -4.28125
{container="kube-rbac-proxy-self"} -2.484375
{container="prom-label-proxy"} -1.14453125
{container="kube-rbac-proxy"} -0.18359375
{container="kube-rbac-proxy-main"} 1.30078125
{container="thanos-sidecar"} 3.62109375
{container="kube-rbac-proxy-rules"} 3.703125
{container="prometheus-proxy"} 5.26953125
{container="oauth-proxy"} 6.1015625
{container="reload"} 7.16015625
{container="alertmanager-proxy"} 7.78515625
{container="kube-rbac-proxy-thanos"} 8.29296875
{container="prometheus-adapter"} 9.58203125
{container="grafana-proxy"} 9.8125
{container="config-reloader"} 10.67578125
{container="thanos-query"} 60.05859375
{container="prometheus"} 1112.93359375
********************************
# for i in $(kubectl -n openshift-monitoring get pod --no-headers | awk '{print $1}'); do echo $i; kubectl -n openshift-monitoring get pod $i -o go-template='{{range.spec.containers}}{{"Container Name: "}}{{.name}}{{"\r\nresources: "}}{{.resources}}{{"\n"}}{{end}}'; echo -e "\n"; done
alertmanager-main-0
Container Name: alertmanager
resources: map[requests:map[cpu:4m memory:40Mi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: alertmanager-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
alertmanager-main-1
Container Name: alertmanager
resources: map[requests:map[cpu:4m memory:40Mi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: alertmanager-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
alertmanager-main-2
Container Name: alertmanager
resources: map[requests:map[cpu:4m memory:40Mi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: alertmanager-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
cluster-monitoring-operator-fdb9d949c-vkl5q
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: cluster-monitoring-operator
resources: map[requests:map[cpu:10m memory:75Mi]]
grafana-7bb7f88d68-7ks6f
Container Name: grafana
resources: map[requests:map[cpu:4m memory:64Mi]]
Container Name: grafana-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
kube-state-metrics-69cc98557f-stb24
Container Name: kube-state-metrics
resources: map[requests:map[cpu:2m memory:80Mi]]
Container Name: kube-rbac-proxy-main
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-self
resources: map[requests:map[cpu:1m memory:15Mi]]
node-exporter-2v86g
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
node-exporter-427b8
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
node-exporter-5whz5
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
node-exporter-9r2bz
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
node-exporter-khtd6
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
node-exporter-psxqr
Container Name: node-exporter
resources: map[requests:map[cpu:8m memory:32Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
openshift-state-metrics-5f54b4ff58-w674d
Container Name: kube-rbac-proxy-main
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy-self
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: openshift-state-metrics
resources: map[requests:map[cpu:1m memory:32Mi]]
prometheus-adapter-6cb7687895-9bfn8
Container Name: prometheus-adapter
resources: map[requests:map[cpu:1m memory:40Mi]]
prometheus-adapter-6cb7687895-sppwt
Container Name: prometheus-adapter
resources: map[requests:map[cpu:1m memory:40Mi]]
prometheus-k8s-0
Container Name: prometheus
resources: map[requests:map[cpu:70m memory:1Gi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: thanos-sidecar
resources: map[requests:map[cpu:1m memory:25Mi]]
Container Name: prometheus-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-thanos
resources: map[requests:map[cpu:1m memory:10Mi]]
prometheus-k8s-1
Container Name: prometheus
resources: map[requests:map[cpu:70m memory:1Gi]]
Container Name: config-reloader
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: thanos-sidecar
resources: map[requests:map[cpu:1m memory:25Mi]]
Container Name: prometheus-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-thanos
resources: map[requests:map[cpu:1m memory:10Mi]]
prometheus-operator-fd77ffdd8-6brvp
Container Name: prometheus-operator
resources: map[requests:map[cpu:5m memory:150Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
telemeter-client-5657ccddfb-74fhr
Container Name: telemeter-client
resources: map[requests:map[cpu:1m memory:40Mi]]
Container Name: reload
resources: map[requests:map[cpu:1m memory:10Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
thanos-querier-db74d4959-74v2r
Container Name: thanos-query
resources: map[requests:map[cpu:10m memory:12Mi]]
Container Name: oauth-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-rules
resources: map[requests:map[cpu:1m memory:15Mi]]
thanos-querier-db74d4959-lbqbv
Container Name: thanos-query
resources: map[requests:map[cpu:10m memory:12Mi]]
Container Name: oauth-proxy
resources: map[requests:map[cpu:1m memory:20Mi]]
Container Name: kube-rbac-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: prom-label-proxy
resources: map[requests:map[cpu:1m memory:15Mi]]
Container Name: kube-rbac-proxy-rules
resources: map[requests:map[cpu:1m memory:15Mi]]
Hi Junqi, the way to calculate the correct memory request is actually documented here: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits. The guidelines say that the requested memory should be 10% higher than the 90th percentile of memory usage during a CI run. I went ahead and made a PR with the actual query which would calculate the discrepancy: https://github.com/openshift/enhancements/pull/788/files When I ran the query after adjustments, the difference between requested and used memory was within 20Mi ranges. Do you think you can do your verification with this query as well? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
From the attached picture and tested in our cluster, need to change resources.requests.memory to a lower value for thanos-sidecar Container Name: thanos-sidecar resources: map[requests:map[cpu:1m memory:100Mi]] (max (kube_pod_container_resource_requests{resource="memory", namespace="openshift-monitoring",container="thanos-sidecar"}) - max (container_memory_usage_bytes{namespace="openshift-monitoring",container="thanos-sidecar"})) /1024 /1024 {} 69.3671875