Bug 1711073

Summary:	Monitoring components running in BestEffort QoS
Product:	OpenShift Container Platform	Reporter:	Seth Jennings <sjenning>
Component:	Monitoring	Assignee:	Pawel Krupa <pkrupa>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	anpicker, erooth, mloibl, pkrupa, surbania, wking
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-16 06:28:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Seth Jennings 2019-05-16 21:19:51 UTC

The following pods run in the BestEffort QoS with no resource requests

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

https://github.com/openshift/origin/pull/22787

This can cause eviction, OOMKilling, and CPU starvation.

Please add the following to the resource requests to the pods in this component:

Memory:
kube-state-metrics  120Mi
prometheus-adapter  50Mi
prometheus-k8s      1Gi
prometheus-operator 100Mi

CPU:
prometheus-k8s 200m
all others 10m

Comment 1 Frederic Branczyk 2019-05-17 07:38:22 UTC

At least kube-state-metrics and prometheus-k8s resources heavily depend on cluster size. Should we still go ahead with these values to have something, and fix the rest eventually with autoscaling?

Comment 2 Seth Jennings 2019-05-17 13:38:07 UTC

Yes.  Literally any setting for requests is better than none at all.  The vertical pod autoscaler (VPA) can help with this later.

Comment 3 Frederic Branczyk 2019-05-17 13:44:51 UTC

Ack, I just wanted to clarify that. We'll take care of this. Thanks!

Comment 4 Seth Jennings 2019-05-17 20:21:39 UTC

What PR(s) fixed this?

Comment 5 Seth Jennings 2019-05-17 20:24:18 UTC

Nevermind, found it
https://github.com/openshift/cluster-monitoring-operator/pull/356

Comment 7 Junqi Zhao 2019-06-25 03:51:52 UTC

qosClass for all pods are Burstable
already added resources.requests.memory and resources.requests.cpu for

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

payload: 4.2.0-0.nightly-2019-06-24-160709

Comment 9 errata-xmlrpc 2019-10-16 06:28:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 10 W. Trevor King 2020-12-08 04:38:59 UTC

Follow-up monitoring requests work in bug 1905330.