1711073 – Monitoring components running in BestEffort QoS

Bug 1711073 - Monitoring components running in BestEffort QoS

Summary: Monitoring components running in BestEffort QoS

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Pawel Krupa
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-16 21:19 UTC by Seth Jennings
Modified:	2020-12-08 04:38 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:28:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 356	None	closed	Bug 1711073: jsonnet: add resource requests	2020-12-08 04:37:39 UTC
Github	openshift cluster-monitoring-operator pull 363	None	closed	jsonnet: move resource requests assignment from Pods to Containers	2020-12-08 04:38:07 UTC
Github	openshift cluster-monitoring-operator pull 369	None	closed	jsonnet: add resource requests to prom-label-proxy	2020-12-08 04:37:40 UTC
Red Hat Product Errata	RHBA-2019:2922	None	None	None	2019-10-16 06:29:11 UTC

Description Seth Jennings 2019-05-16 21:19:51 UTC

The following pods run in the BestEffort QoS with no resource requests

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

https://github.com/openshift/origin/pull/22787

This can cause eviction, OOMKilling, and CPU starvation.

Please add the following to the resource requests to the pods in this component:

Memory:
kube-state-metrics  120Mi
prometheus-adapter  50Mi
prometheus-k8s      1Gi
prometheus-operator 100Mi

CPU:
prometheus-k8s 200m
all others 10m

Comment 1 Frederic Branczyk 2019-05-17 07:38:22 UTC

At least kube-state-metrics and prometheus-k8s resources heavily depend on cluster size. Should we still go ahead with these values to have something, and fix the rest eventually with autoscaling?

Comment 2 Seth Jennings 2019-05-17 13:38:07 UTC

Yes.  Literally any setting for requests is better than none at all.  The vertical pod autoscaler (VPA) can help with this later.

Comment 3 Frederic Branczyk 2019-05-17 13:44:51 UTC

Ack, I just wanted to clarify that. We'll take care of this. Thanks!

Comment 4 Seth Jennings 2019-05-17 20:21:39 UTC

What PR(s) fixed this?

Comment 5 Seth Jennings 2019-05-17 20:24:18 UTC

Nevermind, found it
https://github.com/openshift/cluster-monitoring-operator/pull/356

Comment 7 Junqi Zhao 2019-06-25 03:51:52 UTC

qosClass for all pods are Burstable
already added resources.requests.memory and resources.requests.cpu for

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

payload: 4.2.0-0.nightly-2019-06-24-160709

Comment 9 errata-xmlrpc 2019-10-16 06:28:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 10 W. Trevor King 2020-12-08 04:38:59 UTC

Follow-up monitoring requests work in bug 1905330.

Note You need to log in before you can comment on or make changes to this bug.