Bug 2139418

Summary: MQTT ReX mode makes it too easy to to DDOS Satellite
Product: Red Hat Satellite Reporter: Jan Hutař <jhutar>
Component: Remote ExecutionAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: high Docs Contact:
Priority: high    
Version: 6.12.0CC: ahumbe, aruzicka, pmendezh
Target Milestone: 6.13.0Keywords: Performance, Security, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-03 13:22:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Hutař 2022-11-02 13:25:02 UTC
Description of problem:
Having Satellite with MQTT ReX mode and 3k clients, I can not run ReX on all of them in one go as before.


Version-Release number of selected component (if applicable):
satellite-6.12.0-4.el8sat.noarch


How reproducible:
always


Steps to Reproduce:
1. Configure Sat with `satellite-installer --foreman-proxy-plugin-remote-execution-script-mode=pull-mqtt` (in my case I also used "--tuning=medium --foreman-foreman-service-puma-threads-min=16 --foreman-foreman-service-puma-threads-max=16 --foreman-foreman-service-puma-workers=8 --foreman-db-pool=128 --foreman-proxy-plugin-remote-execution-script-install-key=true")
2. Ran a "Command" ReX on 3000 hosts with:
       subscription-manager refresh
       yum -y install insights-client
       insights-client --register


Actual results:
Log proxy.log has a bunch of SSL errors:

2022-11-02T09:04:25  [E] <OpenSSL::SSL::SSLError> SSL_accept SYSCALL returned=5 errno=0 state=TLSv1.3 early data
        /usr/share/ruby/webrick/server.rb:299:in `accept'
        /usr/share/ruby/webrick/server.rb:299:in `block (2 levels) in start_thread'
        /usr/share/ruby/webrick/utils.rb:263:in `timeout'
        /usr/share/ruby/webrick/server.rb:297:in `block in start_thread'

The client wasn't able to retrieve the job:

Nov 02 09:04:25 f09-h26-b02-5039ms-container100.red.ddns.perf.redhat.com yggdrasild[647]: [yggdrasild] 2022/11/02 09:04:25 cannot get detached message content: cannot download from URL: Get "https://f09-h26-b01-5039ms.rdu2.scalelab.redhat.com:9090/ssh/jobs/88e7dae2-3998-4366-b9e5-aa4216e0030f": net/http: TLS handshake timeout

=> ReX job got stuck after 879 succeeded tasks, with 2121 pending.


Expected results:
We either need to decide on documenting max allowed number of hosts one capsule with MQTT can handle, or providing a way how to limit concurrency when running the host.

FYI as of now, default tuning is documented to be able to handle 5k hosts.


Additional info:
Thanks aruzicka for investigating this!

When I switched to SSH mode with `satellite-installer --foreman-proxy-plugin-remote-execution-script-mode=ssh` and re-ran the job (after cancelling sub-tasks from previous one), 2991 succeeded, only 9 failed.

I do not mark this as a regression, because IIRC MQTT is optional.

Comment 5 Peter Ondrejka 2023-03-01 14:17:42 UTC
Verified on Sat 6.13 snap 12, the mqtt_rate_limit parameter has been added and is applied as expected.

Comment 8 errata-xmlrpc 2023-05-03 13:22:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.13 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2097