Bug 2053308

Summary: multicluster-operators-hub-subscription OOMKilled
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Yerzhan Beisembayev <ybeisemb>
Component: App LifecycleAssignee: Roke Jung <rjung>
Status: CLOSED ERRATA QA Contact: Rafat Islam <rislam>
Severity: high Docs Contact: bswope <bswope>
Priority: unspecified    
Version: rhacm-2.4.zCC: crizzo, rjung, xiangli, yuhe
Target Milestone: ---Flags: rislam: rhacm-2.4?
bot-tracker-sync: rhacm-2.4.z+
Target Release: rhacm-2.4.3   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-03 16:44:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Memory usage stats
none
multicluster-operators-hub-subscription pod logs none

Description Yerzhan Beisembayev 2022-02-10 22:21:42 UTC
Description of the problem:
multicluster-operators-hub-subscription pod OOMKilled once it reaches memory limit.
If memory limit is set to 2G - it gets killed in about an hour after startup
If memory limit is set to 6G - it takes about 5 hours.
Memory usage steadily grows over time until it gets killed.

Release version:
ACM 2.4.1

Operator snapshot version:

OCP version:
v4.8.29 (ARO)

Browser Info:

Steps to reproduce:
1. Configure ACM similar to the following (numbers from the customer environment):
- 3 managed clusters + local cluster
- 5 Git channles
- 2 HelmRepo channels
- 30 applications across 12 namespaces
- 23 placement rules across 7 namespaces
- 81 subscriptions across 12 namespaces
All apps are helm charts of various complexity (including subchart, third party charts, etc.) - in total about 25 various helm charts.

2. OOB memory limit is 2G for multicluster-operators-hub-subscription
3. Observe memory consumption of multicluster-operators-hub-subscription pod over several hours - it will constantly grow until pod is OOMKilled

Actual results:
multicluster-operators-hub-subscription pod will get killed about every hour.

Expected results:
Memory consumption is steady.

Additional info:
Screenshot shows memory usage for a pod in the course of 6 hours.
Currently memory limit is set to 6G - it took 5hrs 10min before it was killed.

Comment 1 Yerzhan Beisembayev 2022-02-10 22:25:23 UTC
Created attachment 1860486 [details]
Memory usage stats

Comment 2 Yerzhan Beisembayev 2022-02-11 15:59:04 UTC
Created attachment 1860632 [details]
multicluster-operators-hub-subscription pod logs

Comment 3 Roke Jung 2022-03-02 15:20:32 UTC
We have identified the source of memory leak. The fix will be in >= 2.4.3.

Comment 4 Rafat Islam 2022-04-19 21:54:15 UTC
Verified on 2.4.3RC3 on Firefox.

Created more than 30 subscription based applications, 6 helm based applications, 10 channels, 9 placementrules across various namespaces. Once looking at the console chart for memory consumption, the consumption was seen to spike momentarily and then return to be steady.

Comment 10 errata-xmlrpc 2022-05-03 16:44:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.4.4 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1681