Bug 2053308 - multicluster-operators-hub-subscription OOMKilled
Summary: multicluster-operators-hub-subscription OOMKilled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: App Lifecycle
Version: rhacm-2.4.z
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: rhacm-2.4.3
Assignee: Roke Jung
QA Contact: Rafat Islam
bswope@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-10 22:21 UTC by Yerzhan Beisembayev
Modified: 2022-05-03 16:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-03 16:44:03 UTC
Target Upstream Version:
Embargoed:
rislam: rhacm-2.4?
bot-tracker-sync: rhacm-2.4.z+


Attachments (Terms of Use)
Memory usage stats (57.30 KB, image/png)
2022-02-10 22:25 UTC, Yerzhan Beisembayev
no flags Details
multicluster-operators-hub-subscription pod logs (2.77 MB, application/gzip)
2022-02-11 15:59 UTC, Yerzhan Beisembayev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 19861 0 None None None 2022-02-11 01:38:39 UTC
Red Hat Product Errata RHSA-2022:1681 0 None None None 2022-05-03 16:44:35 UTC

Description Yerzhan Beisembayev 2022-02-10 22:21:42 UTC
Description of the problem:
multicluster-operators-hub-subscription pod OOMKilled once it reaches memory limit.
If memory limit is set to 2G - it gets killed in about an hour after startup
If memory limit is set to 6G - it takes about 5 hours.
Memory usage steadily grows over time until it gets killed.

Release version:
ACM 2.4.1

Operator snapshot version:

OCP version:
v4.8.29 (ARO)

Browser Info:

Steps to reproduce:
1. Configure ACM similar to the following (numbers from the customer environment):
- 3 managed clusters + local cluster
- 5 Git channles
- 2 HelmRepo channels
- 30 applications across 12 namespaces
- 23 placement rules across 7 namespaces
- 81 subscriptions across 12 namespaces
All apps are helm charts of various complexity (including subchart, third party charts, etc.) - in total about 25 various helm charts.

2. OOB memory limit is 2G for multicluster-operators-hub-subscription
3. Observe memory consumption of multicluster-operators-hub-subscription pod over several hours - it will constantly grow until pod is OOMKilled

Actual results:
multicluster-operators-hub-subscription pod will get killed about every hour.

Expected results:
Memory consumption is steady.

Additional info:
Screenshot shows memory usage for a pod in the course of 6 hours.
Currently memory limit is set to 6G - it took 5hrs 10min before it was killed.

Comment 1 Yerzhan Beisembayev 2022-02-10 22:25:23 UTC
Created attachment 1860486 [details]
Memory usage stats

Comment 2 Yerzhan Beisembayev 2022-02-11 15:59:04 UTC
Created attachment 1860632 [details]
multicluster-operators-hub-subscription pod logs

Comment 3 Roke Jung 2022-03-02 15:20:32 UTC
We have identified the source of memory leak. The fix will be in >= 2.4.3.

Comment 4 Rafat Islam 2022-04-19 21:54:15 UTC
Verified on 2.4.3RC3 on Firefox.

Created more than 30 subscription based applications, 6 helm based applications, 10 channels, 9 placementrules across various namespaces. Once looking at the console chart for memory consumption, the consumption was seen to spike momentarily and then return to be steady.

Comment 10 errata-xmlrpc 2022-05-03 16:44:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.4.4 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1681


Note You need to log in before you can comment on or make changes to this bug.