Bug 1711073

Summary: Monitoring components running in BestEffort QoS
Product: OpenShift Container Platform Reporter: Seth Jennings <sjenning>
Component: MonitoringAssignee: Pawel Krupa <pkrupa>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: anpicker, erooth, mloibl, pkrupa, surbania, wking
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:28:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Seth Jennings 2019-05-16 21:19:51 UTC
The following pods run in the BestEffort QoS with no resource requests

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

https://github.com/openshift/origin/pull/22787

This can cause eviction, OOMKilling, and CPU starvation.

Please add the following to the resource requests to the pods in this component:

Memory:
kube-state-metrics  120Mi
prometheus-adapter  50Mi
prometheus-k8s      1Gi
prometheus-operator 100Mi

CPU:
prometheus-k8s 200m
all others 10m

Comment 1 Frederic Branczyk 2019-05-17 07:38:22 UTC
At least kube-state-metrics and prometheus-k8s resources heavily depend on cluster size. Should we still go ahead with these values to have something, and fix the rest eventually with autoscaling?

Comment 2 Seth Jennings 2019-05-17 13:38:07 UTC
Yes.  Literally any setting for requests is better than none at all.  The vertical pod autoscaler (VPA) can help with this later.

Comment 3 Frederic Branczyk 2019-05-17 13:44:51 UTC
Ack, I just wanted to clarify that. We'll take care of this. Thanks!

Comment 4 Seth Jennings 2019-05-17 20:21:39 UTC
What PR(s) fixed this?

Comment 5 Seth Jennings 2019-05-17 20:24:18 UTC
Nevermind, found it
https://github.com/openshift/cluster-monitoring-operator/pull/356

Comment 7 Junqi Zhao 2019-06-25 03:51:52 UTC
qosClass for all pods are Burstable
already added resources.requests.memory and resources.requests.cpu for

openshift-monitoring/kube-state-metrics
openshift-monitoring/prometheus-adapter
openshift-monitoring/prometheus-k8s
openshift-monitoring/prometheus-operator

payload: 4.2.0-0.nightly-2019-06-24-160709

Comment 9 errata-xmlrpc 2019-10-16 06:28:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 10 W. Trevor King 2020-12-08 04:38:59 UTC
Follow-up monitoring requests work in bug 1905330.