Bug 1572587 - prometheus pods getting oomkilled @ 100 node scale
Summary: prometheus pods getting oomkilled @ 100 node scale
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.10.0
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.10.0
Assignee: Dan Mace
QA Contact: Mike Fiedler
URL:
Whiteboard: aos-scalability-310
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-27 11:25 UTC by Jeremy Eder
Modified: 2018-12-20 21:46 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-12-20 21:12:30 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github https://github.com/openshift openshift-ansible pull 8514 0 None None None 2020-06-30 06:14:23 UTC

Comment 1 Simon Pasquier 2018-04-27 12:51:45 UTC
Depending on the actual memory limits, you may or may not affected by this but other people have reported memory leaks with Prometheus 2.2.1 [1]. There's a opened PR [2] that seems to fix the problem but it may have surfaced other issues.

[1] https://github.com/prometheus/prometheus/issues/4095
[2] https://github.com/prometheus/prometheus/pull/4013

Comment 2 Jeremy Eder 2018-04-27 16:09:08 UTC
# oc edit cm -n openshift-monitoring cluster-monitoring-config

add the resources line below.

    prometheusK8s:                                                                                                                                                          
      baseImage: quay.io/prometheus/prometheus                                                                                                                              
      resources: {}

wait patiently. mine took about 5 minutes to stop/start both prometheus-k8s-N pods

# oc get pod -n openshift-monitoring prometheus-k8s-1 -o yaml

Now you will see 

    name: prometheus                                                                                                                                                        
    resources:                                                                                                                                                              
      requests:                                                                                                                                                             
        memory: 2Gi     

The RSS of these prometheus processes is just about 2.2G right now, each using 0.2 cores (scale labl env, 100 node cluster, 600 pods).

I think we need to disable the limits in-product, or at least bump them to something like 30G (number taken from starter clusters, see attached image).

Comment 5 Dan Mace 2018-05-21 17:16:31 UTC
(In reply to Dan Mace from comment #4)
> https://github.com/openshift/openshift-ansible/pull/8442

New upstream fix: https://github.com/openshift/cluster-monitoring-operator/pull/19

Will also require an openshift-ansible PR to a new cluster-monitoring-operator release, which I'll link here.

Comment 6 Dan Mace 2018-05-23 14:48:59 UTC
Fix for this is ready, still trying to get a new cluster-monitoring-operator release pushed so I can open a new openshift-ansible PR.

Comment 10 Mike Fiedler 2018-06-05 18:57:45 UTC
Moving back to ASSIGNED based on comment 9.

Comment 11 Dan Mace 2018-06-06 17:02:00 UTC
This can be tested with the release of https://github.com/openshift/openshift-ansible/pull/8591

Comment 12 Wei Sun 2018-06-08 02:01:29 UTC
The PR has been merged to openshift-ansible-3.10.0-0.63.0,please check

Comment 13 Mike Fiedler 2018-06-11 15:46:22 UTC
Verified on 3.10.0-0.64.0.  prometheus-operator now using PV/PVC for persistence and resource limits have been removed from deployments/statefulsets/daemonsets.


Note You need to log in before you can comment on or make changes to this bug.