Bug 1683461

Summary: Usage of CPU limits in pods that run on masters
Product: OpenShift Container Platform Reporter: Derek Carr <decarr>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: fbranczy, mloibl, surbania
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Carr 2019-02-26 22:09:14 UTC
Description of problem:

The cluster-monitoring-operator components are using cpu limits.

Usage of cpu limits is not recommended for cluster managed components as it introduces unnecessary latency, and in general, we can depend on CFS sharing enforced via cpu requests to get proper sharing of cpu time.

An e2e test in origin is attempting to enforce that we maintain this restriction here: https://github.com/openshift/origin/pull/22095

Invalid control plane pods found with resource limits set 
openshift-monitoring/node-exporter-98hkp
openshift-monitoring/node-exporter-crszn
openshift-monitoring/node-exporter-qwwjq

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Frederic Branczyk 2019-02-27 14:20:00 UTC
These limits have just been removed in: https://github.com/openshift/cluster-monitoring-operator/pull/273. Moving to modified.

Comment 4 Junqi Zhao 2019-03-11 07:54:02 UTC
Removed resources.limits.cpu for 

grafana/node-exporter/telemeter-client

payload: 4.0.0-0.nightly-2019-03-06-074438

Comment 5 minden 2019-03-11 10:22:34 UTC
The Prometheus and Alertmanager side cars are templated by the Prometheus Operator. The Prometheus Operator makes the resource limits of the side car configurable but does not allow to disable them entirely. I have opened up https://github.com/coreos/prometheus-operator/issues/2472 to discuss the way moving forward with everyone. Once we made progress there we can propagate it through to the cluster monitoring stack.

Comment 8 errata-xmlrpc 2019-06-04 10:44:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Comment 9 Red Hat Bugzilla 2023-09-14 05:24:32 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days