Description of problem: The cluster-monitoring-operator components are using cpu limits. Usage of cpu limits is not recommended for cluster managed components as it introduces unnecessary latency, and in general, we can depend on CFS sharing enforced via cpu requests to get proper sharing of cpu time. An e2e test in origin is attempting to enforce that we maintain this restriction here: https://github.com/openshift/origin/pull/22095 Invalid control plane pods found with resource limits set openshift-monitoring/node-exporter-98hkp openshift-monitoring/node-exporter-crszn openshift-monitoring/node-exporter-qwwjq Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
These limits have just been removed in: https://github.com/openshift/cluster-monitoring-operator/pull/273. Moving to modified.
Removed resources.limits.cpu for grafana/node-exporter/telemeter-client payload: 4.0.0-0.nightly-2019-03-06-074438
The Prometheus and Alertmanager side cars are templated by the Prometheus Operator. The Prometheus Operator makes the resource limits of the side car configurable but does not allow to disable them entirely. I have opened up https://github.com/coreos/prometheus-operator/issues/2472 to discuss the way moving forward with everyone. Once we made progress there we can propagate it through to the cluster monitoring stack.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days