Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1922035

Summary: [Prometheus] memory leak
Product: OpenShift Container Platform Reporter: Vincent Lours <vlours>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: high    
Version: 4.6CC: alegrand, anpicker, dgrisonn, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-01 13:04:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vincent Lours 2021-01-29 04:10:43 UTC
Description of problem:
During the upgrade from 4.5.16 to 4.6.12 Prometheus used a lot of memory on a node and was OOM killed several times.

Version-Release number of selected component (if applicable):
ocp 4.6.12

How reproducible:
It may be related to an updated of a shared configmap that has not been included to an existing POD

Actual results:
Multiple PODs in openshift-monitoring have been killed several times.
But moving the Prometheus PODs on different servers the issue seems fixed.


Expected results:
Avoid any OOM killed POD during the upgrade process.


Additional info:

Comment 1 Vincent Lours 2021-01-29 04:13:00 UTC
Prometheus memory usage on the server:

~~~
 510698 root      20   0 1722952  56640      0 S 128.2   0.7   3:48.41 openshift-sdn-n
 513725 nfsnobo+  20   0   10.5g   6.2g      0 R  69.8  79.4   3:41.82 prometheus
 519648 root      20   0 1819332  11612     48 R  57.1   0.1   1:05.85 coredns
~~~

Comment 8 Damien Grisonnet 2021-01-29 09:10:59 UTC
This might be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1918683, leaving it up to @spasquie to decide.

Comment 11 Vincent Lours 2021-02-01 04:01:52 UTC
@spasquie,

Actually, the must-gather are already downloaded & extracted in the folder '02855464_n' from the command pyyank -t 02855464

I had to manually extract the file 'sbx-must-gather.tar.gz' from the protected zip file.

~~~
[vlours@supportshell ~]$ ls -l /cases/02855464_n/
total 4
drwxrwxrwx+ 3 vlours vlours 4096 Jan 28 19:51 0010-sbx-must-gather.tar.zip
drwxrwxrwx+ 3 yank   yank     59 Feb  1 03:19 0020-sbx-post-upgrade.tar.gz
drwxrwxrwx+ 3 yank   yank     59 Jan 28 14:55 sbx-must-gather.tar.gz
~~~

- `0010-sbx-must-gather.tar.zip` folder contains the must-gather when they have the prometheus pods running on the Infra nodes.
- `0020-sbx-post-upgrade.tar.gz` folder contains the last must-gather, after the upgrade as been completed.