Bug 1683461 - Usage of CPU limits in pods that run on masters [NEEDINFO]
Summary: Usage of CPU limits in pods that run on masters
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Frederic Branczyk
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-26 22:09 UTC by Derek Carr
Modified: 2019-06-04 10:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:44:39 UTC
Target Upstream Version:
juzhao: needinfo? (fbranczy)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:44:45 UTC

Description Derek Carr 2019-02-26 22:09:14 UTC
Description of problem:

The cluster-monitoring-operator components are using cpu limits.

Usage of cpu limits is not recommended for cluster managed components as it introduces unnecessary latency, and in general, we can depend on CFS sharing enforced via cpu requests to get proper sharing of cpu time.

An e2e test in origin is attempting to enforce that we maintain this restriction here: https://github.com/openshift/origin/pull/22095

Invalid control plane pods found with resource limits set 
openshift-monitoring/node-exporter-98hkp
openshift-monitoring/node-exporter-crszn
openshift-monitoring/node-exporter-qwwjq

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Frederic Branczyk 2019-02-27 14:20:00 UTC
These limits have just been removed in: https://github.com/openshift/cluster-monitoring-operator/pull/273. Moving to modified.

Comment 4 Junqi Zhao 2019-03-11 07:54:02 UTC
Removed resources.limits.cpu for 

grafana/node-exporter/telemeter-client

payload: 4.0.0-0.nightly-2019-03-06-074438

Comment 5 minden 2019-03-11 10:22:34 UTC
The Prometheus and Alertmanager side cars are templated by the Prometheus Operator. The Prometheus Operator makes the resource limits of the side car configurable but does not allow to disable them entirely. I have opened up https://github.com/coreos/prometheus-operator/issues/2472 to discuss the way moving forward with everyone. Once we made progress there we can propagate it through to the cluster monitoring stack.

Comment 8 errata-xmlrpc 2019-06-04 10:44:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.