Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1922035

Summary:	[Prometheus] memory leak
Product:	OpenShift Container Platform	Reporter:	Vincent Lours <vlours>
Component:	Monitoring	Assignee:	Simon Pasquier <spasquie>
Status:	CLOSED DUPLICATE	QA Contact:	Junqi Zhao <juzhao>
Severity:	medium	Docs Contact:
Priority:	high
Version:	4.6	CC:	alegrand, anpicker, dgrisonn, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-01 13:04:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vincent Lours 2021-01-29 04:10:43 UTC

Description of problem:
During the upgrade from 4.5.16 to 4.6.12 Prometheus used a lot of memory on a node and was OOM killed several times.

Version-Release number of selected component (if applicable):
ocp 4.6.12

How reproducible:
It may be related to an updated of a shared configmap that has not been included to an existing POD

Actual results:
Multiple PODs in openshift-monitoring have been killed several times.
But moving the Prometheus PODs on different servers the issue seems fixed.


Expected results:
Avoid any OOM killed POD during the upgrade process.


Additional info:

Comment 1 Vincent Lours 2021-01-29 04:13:00 UTC

Prometheus memory usage on the server:

~~~
 510698 root      20   0 1722952  56640      0 S 128.2   0.7   3:48.41 openshift-sdn-n
 513725 nfsnobo+  20   0   10.5g   6.2g      0 R  69.8  79.4   3:41.82 prometheus
 519648 root      20   0 1819332  11612     48 R  57.1   0.1   1:05.85 coredns
~~~

Comment 8 Damien Grisonnet 2021-01-29 09:10:59 UTC

This might be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1918683, leaving it up to @spasquie to decide.

Comment 11 Vincent Lours 2021-02-01 04:01:52 UTC

@spasquie,

Actually, the must-gather are already downloaded & extracted in the folder '02855464_n' from the command pyyank -t 02855464

I had to manually extract the file 'sbx-must-gather.tar.gz' from the protected zip file.

~~~
[vlours@supportshell ~]$ ls -l /cases/02855464_n/
total 4
drwxrwxrwx+ 3 vlours vlours 4096 Jan 28 19:51 0010-sbx-must-gather.tar.zip
drwxrwxrwx+ 3 yank   yank     59 Feb  1 03:19 0020-sbx-post-upgrade.tar.gz
drwxrwxrwx+ 3 yank   yank     59 Jan 28 14:55 sbx-must-gather.tar.gz
~~~

- `0010-sbx-must-gather.tar.zip` folder contains the must-gather when they have the prometheus pods running on the Infra nodes.
- `0020-sbx-post-upgrade.tar.gz` folder contains the last must-gather, after the upgrade as been completed.