Bug 2014043

Summary: Publish/promote of CV with puppet environments consume GBs of memory for scaled EnvironmentClass counts
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: PuppetAssignee: satellite6-bugs <satellite6-bugs>
Status: NEW --- QA Contact: Vladimír Sedmík <vsedmik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.8.0CC: jsherril, lstejska, mfalz, mhulan, momran, nalfassi, rlavi, satellite6-bugs
Target Milestone: UnspecifiedKeywords: Performance, Triaged
Target Release: UnusedFlags: pmoravec: needinfo? (mfalz)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2021-10-14 11:12:14 UTC
Description of problem:
User story:
publishing or promoting a CV with puppet modules does consume up to 30GB of memory in sidekiq process.

The root cause is in this call flow:
- there is Actions::Katello::Foreman::ContentUpdate dynflow step in Finalize phase 
- it calls ::Katello::Foreman.update_puppet_environment(content_view, environment)
- it calls PuppetClassImporter.new(..).update_environment
- it executes "changed = self.changes" from foreman
- and directly the method https://github.com/theforeman/foreman/blob/2.5-stable/app/services/puppet_class_importer.rb#L19 does consume so much memory

(the above statement "method 'changes' itself causes the high memory usage" was confirmed by running a rake script at a customer where a PuppetClassImporter(..).changes was the method that bumped memory usage by >4GB, _alone_ )

Further internal testing revealed that the scaling factor affecting the high memory usage is EnvironmentClass.count . For 120k EnvironmentClass items, rake script executing the "changes" method consumed 900MB memory (400MB rake itself, 500MB the method). For 260k EnvironmentClass items, I got 1552MB usage (400MB rake, 1152MB the method). Lineraly prolonging this to 1M EnvironmentClass items (that the customer has), we get 4GB memory per *one* execution of the method.

Due to the way how ruby does (not) return freed memory, repeated calling of the method by different PuppetClassImporter instances does consume further GBs of memory, every time. Until the usage is (probably) stabilised at 30GB per *one* sidekiq process.

Please optimise the foreman method (being used in 6.10 as well, hence we do support it there also) to be more memory efficient for scaled EnvironmentClass instances.


Version-Release number of selected component (if applicable):
Sat6.8 (same affects 6.9 and will 6.10)


How reproducible:
100%


Steps to Reproduce:
1. Import many puppet modules (I will provide a script for it)
2. Have any repo enabled
3. Create many CVs with many CV versions having many puppet modules associated each.
4. Monitor memory usage of sidekiq processes during the CV publish/promote
5. Monitor # of puppet env. classes:
su - postgres -c "psql foreman -c \"select count(*) from environment_classes;\""


Actual results:
memory usage grows linearly with the EnvironmentClass.count / count(*) from environment_classes. For 1M classes, memory bumps by 4GB in a CV publish/promote.


Expected results:
something more sustainable :)


Additional info:
Some stats of the impact of # of env.classes to RSS usage of the 'changes' method (RSS usage of rake running the method):

RSS    EnvClass.count
878920 119903
1059592 129122
1034964 138341
1029156 147560
897756 156779
1010816 165998
1233568 175217
1187076 184436
1244400 193655
1244380 202874
1300772 212093
1269580 221312
1286400 230531
1316340 239750
1472676 248969
1551976 258188
1569808 267407
1578724 276626
.. (test continues) ..

Comment 2 Pavel Moravec 2022-02-01 12:34:43 UTC
Hit again for the same customer.

Comment 3 Melanie Falz 2022-07-06 08:47:22 UTC
The same customer hit the issue again and asked for a fix, is it possible to give an outlook on this?

Comment 4 Pavel Moravec 2022-07-06 10:11:50 UTC
(In reply to Melanie Falz from comment #3)
> The same customer hit the issue again and asked for a fix, is it possible to
> give an outlook on this?

Needinfo on satellite6-bugs is not maintained by anybody and it rather causes more harm than gain. But I am very uncertain who in particular to ask for - raising needinfo to a few people, please cancel them once either of them provides an answer.

Comment 6 Ron Lavi 2022-07-25 06:06:41 UTC
Hello, sorry for the inconvenience.
This behavior seems to be the same since Satellite 6.0, so it's not a regression.
Adding it to our backlog and when we will have more capacity we will look into it.