Bug 1920072 - Puppetserver is overloaded after upgrade from Satellite 6.7 to Satellite 6.8 [NEEDINFO]
Summary: Puppetserver is overloaded after upgrade from Satellite 6.7 to Satellite 6.8
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Installer
Version: 6.8.0
Hardware: All
OS: All
high
high vote
Target Milestone: 6.10.0
Assignee: satellite6-bugs
QA Contact: Devendra Singh
URL:
Whiteboard:
Depends On:
Blocks: 1541321
TreeView+ depends on / blocked
 
Reported: 2021-01-25 16:39 UTC by Anand Agrawal
Modified: 2021-11-16 14:10 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1952603 (view as bug list)
Environment:
Last Closed: 2021-11-16 14:09:54 UTC
Target Upstream Version:
aagrawal: needinfo? (ikaur)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 32269 0 High New Puppetserver 6+ really benefits from increased ReservedCodeCacheSize 2021-04-07 17:46:57 UTC
Red Hat Knowledge Base (Solution) 5800701 0 None None None 2021-02-15 18:31:12 UTC
Red Hat Product Errata RHSA-2021:4702 0 None None None 2021-11-16 14:10:05 UTC

Description Anand Agrawal 2021-01-25 16:39:32 UTC
Description of problem:

After Satellite upgrade from Satellite 6.7 to 6.8 (puppet 6), puppetserver is consuming a lot of memory. 

Increasing timeout does not work as after some time processing time increases.

Puppetserver needs increased heap size to properly do garbage collection but this also doesn't work.

JAVA_ARGS="-Xms4G -Xmx8G -XX:+PrintGCDetails -Xloggc:/var/log/puppetlabs/puppetserver/gc.log -Djruby.logger.class=com.puppetlabs.jruby_utils.jruby.Slf4jLogger"

Version-Release number of selected component (if applicable):
6.8

How reproducible:


Steps to Reproduce:
1. 
2.
3.

Actual results:

2020-12-21T15:06:24 0f5358cf [E] Puppet is taking too long to respond, please try again later.
2020-12-21T15:06:24 0f5358cf [W] Error details for Puppet is taking too long to respond, please try again later.: <Exception>: Puppet is taking too long to respond, please try again later.
2020-12-21T15:06:24 0f5358cf [W] Puppet is taking too long to respond, please try again later.: <Exception>: Puppet is taking too long to respond, please try again later.

Expected results:

lightweight services in the Satellite service

Additional info:

Comment 1 Ewoud Kohl van Wijngaarden 2021-01-28 16:22:08 UTC
This appears to be related to JRuby 9k which Puppetserver 6 enabled by default (Puppetserver 5 had it optional). The ReservedCodeCache JVM parameter is now relevant and https://puppet.com/docs/puppetserver/6.12.2/tuning_guide.html#potential-java-args-settings states:

> If you’re working outside of lab environment, increase ReservedCodeCache to 512m under normal load. If you’re working with 6-12 JRuby instances (or a max-requests-per-instance value significantly less than 100k), run with a ReservedCodeCache of 1G. Twelve or more JRuby instances in a single server might require 2G or more.

Ideally the installer would tune this automatically (and provide an easy parameter to set it).

For now users can set this via:

  --puppet-server-jvm-extra-args -XX:ReservedCodeCache=512m

Of course 512m should be adjusted accordingly.

Comment 2 Anand Agrawal 2021-01-28 16:39:01 UTC
(In reply to Ewoud Kohl van Wijngaarden from comment #1)
> This appears to be related to JRuby 9k which Puppetserver 6 enabled by
> default (Puppetserver 5 had it optional). The ReservedCodeCache JVM
> parameter is now relevant and
> https://puppet.com/docs/puppetserver/6.12.2/tuning_guide.html#potential-java-
> args-settings states:
> 
> > If you’re working outside of lab environment, increase ReservedCodeCache to 512m under normal load. If you’re working with 6-12 JRuby instances (or a max-requests-per-instance value significantly less than 100k), run with a ReservedCodeCache of 1G. Twelve or more JRuby instances in a single server might require 2G or more.
> 
> Ideally the installer would tune this automatically (and provide an easy
> parameter to set it).
> 
> For now users can set this via:
> 
>   --puppet-server-jvm-extra-args -XX:ReservedCodeCache=512m
> 
> Of course 512m should be adjusted accordingly.

Do you have any suggestion for the number of puppet-server-max-active-instances ?

Comment 3 Peter Vreman 2021-01-28 17:14:41 UTC
The syslog is also reporting the issue of the CodeSize is only logged once and as clients are checking in:

Example of a start and then the next puppet run of 50 clients starting around 18:00:
----
Jan 25 17:38:34 li-lc-2224 systemd[1]: Started puppetserver Service.
Jan 25 18:09:40 li-lc-2224 puppetserver[8219]: OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
Jan 25 18:09:40 li-lc-2224 puppetserver[8219]: OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
Jan 25 18:09:40 li-lc-2224 puppetserver[8219]: CodeCache: size=245760Kb used=243542Kb max_used=243594Kb free=2217Kb
Jan 25 18:09:40 li-lc-2224 puppetserver[8219]: bounds [0x00007f364d000000, 0x00007f365c000000, 0x00007f365c000000]
Jan 25 18:09:40 li-lc-2224 puppetserver[8219]: total_blobs=48833 nmethods=48081 adapters=655
Jan 25 18:09:40 li-lc-2224 puppetserver[8219]: compilation: disabled (not enough contiguous free space left)
----


To diagnose the usage you can use the debugging parameter also:

'-XX:+PrintCodeCacheOnCompilation'

This will print a line every compilation anad you can syslog entries for each codesize changes and not only after stopping the application.



It would help to have a dedicated satellite-installer parameter for the ReservedCodeCacheSize, and maybe also increase the default from to 512M or even 1G or 2G as PuppetLabs also recommends if you have 6-12 Jrubies.

Insights can provide detection of the issue with syslog of puppetserver having the Warning 'CodeCache is full. Compiler has been disabled' it is an easy trigger to generate an Incident Advisory and recommendation to fix it.

It would be nice if Insights and/or Satellite-installer can also tuning recommendations on the JRubies, Max Heap and Code

Comment 4 Ewoud Kohl van Wijngaarden 2021-01-28 17:45:45 UTC
> Do you have any suggestion for the number of puppet-server-max-active-instances ?

Not really. Puppet's tuning guide describes very well what the considerations are. For Puppet deployments, a low number will be better because it consumes less memory. On larger installations you need a higher number. Given how diverse Satellite customers are, it's rather difficult to have some magic number that always works.

> It would help to have a dedicated satellite-installer parameter for the ReservedCodeCacheSize, and maybe also increase the default from to 512M or even 1G or 2G as PuppetLabs also recommends if you have 6-12 Jrubies.

That is indeed what I meant with:

> Ideally the installer would tune this automatically (and provide an easy parameter to set it).

A default of 512m would be my safe default but as you suggest I may go with 'undef' as a default and do some calculation in the code, depending on other parameters.

While Insight rules are not my area, this does sound like something that can be easily caught by going through logs and notify users.

Comment 5 Peter Vreman 2021-01-28 18:04:01 UTC
A KB or Doc section can help provide the user guidance. Puppetlabs also has such a table at https://puppet.com/docs/pe/2019.7/tuning_standard.html#tuning_standard.
The RedHat table and docs can use simaliar values but use the satellite-installer parameter as names to be configured.

Comment 6 Peter Vreman 2021-01-28 18:14:31 UTC
I also noticed that puppetserver is still running on java 8.
Is there maybe not a memory/performance improvement when using java 11?

Comment 7 Ewoud Kohl van Wijngaarden 2021-01-28 18:33:09 UTC
(In reply to Peter Vreman from comment #6)
> I also noticed that puppetserver is still running on java 8.
> Is there maybe not a memory/performance improvement when using java 11?

Yes it is using Java 8 and there are improvements when Java 11 is used. However, there are some complications. AFAIK the Candlepin version as shipped in Satellite today doesn't support Java 11 (in development it has been updated and a future release will ship it).

Then the pupetserver package has a hard dependency on java-1.8.0-openjdk (or -headless, don't recall). This isn't great, but sadly there isn't a generic provides that could be used:

# rpm -q --provides java-11-openjdk-headless.x86_64
/usr/bin/jjs
config(java-11-openjdk-headless) = 1:11.0.10.0.9-0.el7_9
java-11-headless = 1:11.0.10.0.9-0.el7_9
java-11-openjdk-headless = 1:11.0.10.0.9-0.el7_9
java-11-openjdk-headless(x86-64) = 1:11.0.10.0.9-0.el7_9
jre-11-headless = 1:11.0.10.0.9-0.el7_9
jre-11-openjdk-headless = 1:11.0.10.0.9-0.el7_9
libjava.so()(64bit)
libjsig.so()(64bit)
libjvm.so()(64bit)
libjvm.so(SUNWprivate_1.1)(64bit)
libverify.so()(64bit)

As you can see, we would need to change the dependency explicitly. Today our packaging exactly reflects upstream packaging and we've been hesitant to change things.

Since it can provide /usr/bin/java via alternatives, I dont' understand why but I haven't reached out to the Red Hat OpenJDK packagers to ask them why this is (due to a lack of time).

While I haven't tested it, it should be possible to use version 11 with:

--puppet-server-jvm-java-bin /usr/lib/jvm/jre-11/bin/java

I think Puppet Enterprise does use Java 11 so if it starts up, I'd expect it to work from there on but at least for me it's uncharted territory.

Comment 8 Pablo Hess 2021-02-01 17:13:00 UTC
New question: what about enabling jruby instance recycling by using the `max-requests-per-instance` setting for Puppet Server? It might have an impact 

The docs at https://puppet.com/docs/puppetserver/6.12.2/tuning_guide.html say this about this parameter:
-------------------
If you’re working outside of lab environment, increase ReservedCodeCache to 512m under normal load. If you’re working with 6-12 JRuby instances (or a max-requests-per-instance value significantly less than 100k), run with a ReservedCodeCache of 1G. Twelve or more JRuby instances in a single server might require 2G or more.
-------------------




On the other hand, these notes from https://puppet.com/docs/puppet/6.20/server/puppet_server_metrics_performance.html mention a potentially more drastic consequence of enabling recycling under multithreaded mode -- which Puppet 6 Server runs under:
-------------------
If you can't identify the source of a memory leak, setting the max-requests-per-instance setting in puppetserver.conf to something other than the default of 0 limits the number of requests a JRuby handles during its lifetime and enables automatic JRuby flushing. Enabling this setting reduces overall performance, but if you enable it and no longer see signs of persistent memory leaks, check your module code for inefficiencies or memory-consuming bugs.

    Note: In multithreaded mode, the max-requests-per-instance setting refers to the sum total number of requests processed by the single JRuby instance, across all of its threads. While that single JRuby is being flushed, all requests will suspend until the instance becomes available again.
-------------------


Can you please confirm the `max-requests-per-instance` option is discouraged on Puppet 6 Server in Satellite 6.8?

Comment 16 Tomer Brisker 2021-02-22 09:21:25 UTC
Moving to installer component as it seems this issue will be resolved by installer tuning parameters for the puppet server and is not related to the puppet integration within satellite itself.

Comment 20 Ewoud Kohl van Wijngaarden 2021-04-07 14:34:03 UTC
In addition to the mentioned option, it was also pointed out in the upstream community that we enable metrics and the profiler by default while Puppet disables it out of the box due to the performance overhead. Today both are managed by a single parameter, but disabling it is broken. In Foreman 2.5 this will be split into two separate parameters, both defaulting to false. See https://github.com/theforeman/puppet-puppet/issues/780 for the issue report and https://github.com/theforeman/puppet-puppet/pull/781 for the fix.

Comment 21 Ewoud Kohl van Wijngaarden 2021-04-07 17:46:56 UTC
Created redmine issue https://projects.theforeman.org/issues/32269 from this bug

Comment 24 Anand Agrawal 2021-04-28 13:20:27 UTC
@ikaur 

Can you please review from performance point of view before release.

Regards,
Anand

Comment 27 Devendra Singh 2021-06-15 07:52:07 UTC
Verified on 6.10 Snap4.

Comment 30 errata-xmlrpc 2021-11-16 14:09:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702


Note You need to log in before you can comment on or make changes to this bug.