Bug 1711073 - Monitoring components running in BestEffort QoS
Summary: Monitoring components running in BestEffort QoS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.2.0
Assignee: Pawel Krupa
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-16 21:19 UTC by Seth Jennings
Modified: 2020-12-08 04:38 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:28:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 356 0 None closed Bug 1711073: jsonnet: add resource requests 2020-12-08 04:37:39 UTC
Github openshift cluster-monitoring-operator pull 363 0 None closed jsonnet: move resource requests assignment from Pods to Containers 2020-12-08 04:38:07 UTC
Github openshift cluster-monitoring-operator pull 369 0 None closed jsonnet: add resource requests to prom-label-proxy 2020-12-08 04:37:40 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:29:11 UTC

Description Seth Jennings 2019-05-16 21:19:51 UTC
The following pods run in the BestEffort QoS with no resource requests

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

https://github.com/openshift/origin/pull/22787

This can cause eviction, OOMKilling, and CPU starvation.

Please add the following to the resource requests to the pods in this component:

Memory:
kube-state-metrics  120Mi
prometheus-adapter  50Mi
prometheus-k8s      1Gi
prometheus-operator 100Mi

CPU:
prometheus-k8s 200m
all others 10m

Comment 1 Frederic Branczyk 2019-05-17 07:38:22 UTC
At least kube-state-metrics and prometheus-k8s resources heavily depend on cluster size. Should we still go ahead with these values to have something, and fix the rest eventually with autoscaling?

Comment 2 Seth Jennings 2019-05-17 13:38:07 UTC
Yes.  Literally any setting for requests is better than none at all.  The vertical pod autoscaler (VPA) can help with this later.

Comment 3 Frederic Branczyk 2019-05-17 13:44:51 UTC
Ack, I just wanted to clarify that. We'll take care of this. Thanks!

Comment 4 Seth Jennings 2019-05-17 20:21:39 UTC
What PR(s) fixed this?

Comment 5 Seth Jennings 2019-05-17 20:24:18 UTC
Nevermind, found it
https://github.com/openshift/cluster-monitoring-operator/pull/356

Comment 7 Junqi Zhao 2019-06-25 03:51:52 UTC
qosClass for all pods are Burstable
already added resources.requests.memory and resources.requests.cpu for

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

payload: 4.2.0-0.nightly-2019-06-24-160709

Comment 9 errata-xmlrpc 2019-10-16 06:28:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 10 W. Trevor King 2020-12-08 04:38:59 UTC
Follow-up monitoring requests work in bug 1905330.


Note You need to log in before you can comment on or make changes to this bug.