Bug 1683461

Summary:	Usage of CPU limits in pods that run on masters
Product:	OpenShift Container Platform	Reporter:	Derek Carr <decarr>
Component:	Monitoring	Assignee:	Frederic Branczyk <fbranczy>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	fbranczy, mloibl, surbania
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:44:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Derek Carr 2019-02-26 22:09:14 UTC

Description of problem:

The cluster-monitoring-operator components are using cpu limits.

Usage of cpu limits is not recommended for cluster managed components as it introduces unnecessary latency, and in general, we can depend on CFS sharing enforced via cpu requests to get proper sharing of cpu time.

An e2e test in origin is attempting to enforce that we maintain this restriction here: https://github.com/openshift/origin/pull/22095

Invalid control plane pods found with resource limits set 
openshift-monitoring/node-exporter-98hkp
openshift-monitoring/node-exporter-crszn
openshift-monitoring/node-exporter-qwwjq

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Frederic Branczyk 2019-02-27 14:20:00 UTC

These limits have just been removed in: https://github.com/openshift/cluster-monitoring-operator/pull/273. Moving to modified.

Comment 4 Junqi Zhao 2019-03-11 07:54:02 UTC

Removed resources.limits.cpu for 

grafana/node-exporter/telemeter-client

payload: 4.0.0-0.nightly-2019-03-06-074438

Comment 5 minden 2019-03-11 10:22:34 UTC

The Prometheus and Alertmanager side cars are templated by the Prometheus Operator. The Prometheus Operator makes the resource limits of the side car configurable but does not allow to disable them entirely. I have opened up https://github.com/coreos/prometheus-operator/issues/2472 to discuss the way moving forward with everyone. Once we made progress there we can propagate it through to the cluster monitoring stack.

Comment 8 errata-xmlrpc 2019-06-04 10:44:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 9 Red Hat Bugzilla 2023-09-14 05:24:32 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days