Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1918683

Summary: prometheus faces inexplicable OOM
Product: OpenShift Container Platform Reporter: Pablo Alonso Rodriguez <palonsor>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED NOTABUG QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: adeshpan, akhaire, alegrand, anpicker, dahernan, ddelcian, dtarabor, erich, erooth, hongyli, igreen, jkaur, kakkoyun, kiyyappa, lcosic, mhernon, ocasalsa, pchavan, pkrupa, rugouvei, spasquie, ssadhale, ssonigra, vlours, wking
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-02 06:27:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
prometheus memory usage surge in a short time
none
Prometheus series and memory metrics during upgrade none

Description Pablo Alonso Rodriguez 2021-01-21 11:20:24 UTC
Description of problem:

Prometheus is consistently crashing with OOM, reaching absurdly high RAM consumptions. We have even increased node RAM to 96GB and prometheus memory limit to 80GB and it still OOMed.

Cluster size should not justify this insane memory consumption (more information to be added in comments). And we have tried some times to wipe the data, but degradation returned shortly (maybe this should also help in discarding high cluster activity as a relevant factor).

Version-Release number of selected component (if applicable):

4.5.16

How reproducible:

Always

Steps to Reproduce:
1. Let prometheus pod run

Actual results:

OOM

Expected results:

Prometheus working

Additional info:

This issue was found due to a failure in "oc adm top nodes", but there are no issues in either the kube-apiserver or the prometheus-adapter.

Comment 4 Simon Pasquier 2021-01-25 13:00:31 UTC
I've looked at the gathered data in supportshell and unfortunately I didn't find any obvious cause to the issue. Also I can't decompress the rar files attached to the BZ: on my laptop, the operation never ends and consumes all free space.

It might very well be that Prometheus is crashlooping because it consumes too much memory at startup due to WAL replay. But that being said and given the size of the cluster, it shouldn't need 80G in steady state. Can you remove the WAL directory once more and graph the following metrics over a couple of hours (ideally at least 4h)?
* prometheus_tsdb_head_series
* sum by(pod) (rate(prometheus_tsdb_head_samples_appended_total[5m]))
* container_memory_working_set_bytes{namespace="openshift-monitoring",container=""}

Comment 5 Ilan Green 2021-01-25 13:20:33 UTC
(In reply to Simon Pasquier from comment #4)
> I've looked at the gathered data in supportshell and unfortunately I didn't
> find any obvious cause to the issue. Also I can't decompress the rar files
> attached to the BZ: on my laptop, the operation never ends and consumes all
> free space.
> 
> It might very well be that Prometheus is crashlooping because it consumes
> too much memory at startup due to WAL replay. But that being said and given
> the size of the cluster, it shouldn't need 80G in steady state. Can you
> remove the WAL directory once more and graph the following metrics over a
> couple of hours (ideally at least 4h)?
> * prometheus_tsdb_head_series
> * sum by(pod) (rate(prometheus_tsdb_head_samples_appended_total[5m]))
> *
> container_memory_working_set_bytes{namespace="openshift-monitoring",
> container=""}

Customer has even recreated the pvc's / as well at trying to remove the wal directory
Is the wal data kept somewhere?
Of possibly we are missing something

rar collectl are extracted on supportshell - they just show the prometheus memory growth - which happens in minutes

Comment 6 Pablo Alonso Rodriguez 2021-01-25 13:29:27 UTC
Ilan, WAL data is part of the pvc contents. I was about to write that we have already tried doing so (both removing only wal and the whole pv) and prometheus only was running for some few minutes, so I guess it is not enough time to gather the metrics you mentioned, or it is?

Regarding collectctl, I am re-uploading in tar.xz (I had no issues in extracting them, actually, but just in case it works better for you)

Comment 10 Junqi Zhao 2021-01-27 12:58:04 UTC
Created attachment 1751251 [details]
prometheus memory usage surge in a short time

upgrade from 4.6.13 to 4.7.0-0.nightly-2021-01-22-134922, memory usage for prometheus is increased in a short time

Comment 11 Simon Pasquier 2021-02-01 13:04:15 UTC
*** Bug 1922035 has been marked as a duplicate of this bug. ***

Comment 29 Simon Pasquier 2021-02-11 08:41:10 UTC
Created attachment 1756338 [details]
Prometheus series and memory metrics during upgrade

Regarding the memory rise reported in attachment 1751251 [details], it is not as bad as it seems (though there's definitely an increase of memory usage during and after upgrade). I've done 4.6 -> 4.7 upgrade and the peak isn't so high if you look at the raw data. My assumption is that when the Prometheus container is restarted, it doesn't have time to mark its metrics as stale which means that for about 5m, the sum() operation will add the memory of the running pod + the memory of the old pod.

Comment 30 Lili Cosic 2021-02-18 09:37:50 UTC
*** Bug 1929875 has been marked as a duplicate of this bug. ***

Comment 68 Red Hat Bugzilla 2023-09-15 01:31:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days