Bug 1918683 - prometheus faces inexplicable OOM
Summary: prometheus faces inexplicable OOM
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1922035 1929875 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-21 11:20 UTC by Pablo Alonso Rodriguez
Modified: 2024-03-25 17:56 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-02 06:27:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
prometheus memory usage surge in a short time (102.60 KB, image/png)
2021-01-27 12:58 UTC, Junqi Zhao
no flags Details
Prometheus series and memory metrics during upgrade (379.32 KB, application/x-xz)
2021-02-11 08:41 UTC, Simon Pasquier
no flags Details

Description Pablo Alonso Rodriguez 2021-01-21 11:20:24 UTC
Description of problem:

Prometheus is consistently crashing with OOM, reaching absurdly high RAM consumptions. We have even increased node RAM to 96GB and prometheus memory limit to 80GB and it still OOMed.

Cluster size should not justify this insane memory consumption (more information to be added in comments). And we have tried some times to wipe the data, but degradation returned shortly (maybe this should also help in discarding high cluster activity as a relevant factor).

Version-Release number of selected component (if applicable):

4.5.16

How reproducible:

Always

Steps to Reproduce:
1. Let prometheus pod run

Actual results:

OOM

Expected results:

Prometheus working

Additional info:

This issue was found due to a failure in "oc adm top nodes", but there are no issues in either the kube-apiserver or the prometheus-adapter.

Comment 4 Simon Pasquier 2021-01-25 13:00:31 UTC
I've looked at the gathered data in supportshell and unfortunately I didn't find any obvious cause to the issue. Also I can't decompress the rar files attached to the BZ: on my laptop, the operation never ends and consumes all free space.

It might very well be that Prometheus is crashlooping because it consumes too much memory at startup due to WAL replay. But that being said and given the size of the cluster, it shouldn't need 80G in steady state. Can you remove the WAL directory once more and graph the following metrics over a couple of hours (ideally at least 4h)?
* prometheus_tsdb_head_series
* sum by(pod) (rate(prometheus_tsdb_head_samples_appended_total[5m]))
* container_memory_working_set_bytes{namespace="openshift-monitoring",container=""}

Comment 5 Ilan Green 2021-01-25 13:20:33 UTC
(In reply to Simon Pasquier from comment #4)
> I've looked at the gathered data in supportshell and unfortunately I didn't
> find any obvious cause to the issue. Also I can't decompress the rar files
> attached to the BZ: on my laptop, the operation never ends and consumes all
> free space.
> 
> It might very well be that Prometheus is crashlooping because it consumes
> too much memory at startup due to WAL replay. But that being said and given
> the size of the cluster, it shouldn't need 80G in steady state. Can you
> remove the WAL directory once more and graph the following metrics over a
> couple of hours (ideally at least 4h)?
> * prometheus_tsdb_head_series
> * sum by(pod) (rate(prometheus_tsdb_head_samples_appended_total[5m]))
> *
> container_memory_working_set_bytes{namespace="openshift-monitoring",
> container=""}

Customer has even recreated the pvc's / as well at trying to remove the wal directory
Is the wal data kept somewhere?
Of possibly we are missing something

rar collectl are extracted on supportshell - they just show the prometheus memory growth - which happens in minutes

Comment 6 Pablo Alonso Rodriguez 2021-01-25 13:29:27 UTC
Ilan, WAL data is part of the pvc contents. I was about to write that we have already tried doing so (both removing only wal and the whole pv) and prometheus only was running for some few minutes, so I guess it is not enough time to gather the metrics you mentioned, or it is?

Regarding collectctl, I am re-uploading in tar.xz (I had no issues in extracting them, actually, but just in case it works better for you)

Comment 10 Junqi Zhao 2021-01-27 12:58:04 UTC
Created attachment 1751251 [details]
prometheus memory usage surge in a short time

upgrade from 4.6.13 to 4.7.0-0.nightly-2021-01-22-134922, memory usage for prometheus is increased in a short time

Comment 11 Simon Pasquier 2021-02-01 13:04:15 UTC
*** Bug 1922035 has been marked as a duplicate of this bug. ***

Comment 29 Simon Pasquier 2021-02-11 08:41:10 UTC
Created attachment 1756338 [details]
Prometheus series and memory metrics during upgrade

Regarding the memory rise reported in attachment 1751251 [details], it is not as bad as it seems (though there's definitely an increase of memory usage during and after upgrade). I've done 4.6 -> 4.7 upgrade and the peak isn't so high if you look at the raw data. My assumption is that when the Prometheus container is restarted, it doesn't have time to mark its metrics as stale which means that for about 5m, the sum() operation will add the memory of the running pod + the memory of the old pod.

Comment 30 Lili Cosic 2021-02-18 09:37:50 UTC
*** Bug 1929875 has been marked as a duplicate of this bug. ***

Comment 68 Red Hat Bugzilla 2023-09-15 01:31:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.