Bug 2238325

Summary:	MaxRequestsPerChild from tuning triggers sporadic silent response for clients using HTTP/2
Product:	Red Hat Satellite	Reporter:	Pavel Moravec <pmoravec>
Component:	Installation	Assignee:	Ewoud Kohl van Wijngaarden <ekohlvan>
Status:	CLOSED ERRATA	QA Contact:	Griffin Sullivan <gsulliva>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.13.0	CC:	ahumbe, egolov, ehelms, ekohlvan, gsulliva, momran, osousa, pcreech, pmendezh, rlavi
Target Milestone:	6.15.0	Keywords:	Triaged
Target Release:	Unused
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	foreman-installer-3.9.0-0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-04-23 17:14:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Moravec 2023-09-11 13:24:27 UTC

Description of problem:
Satellite uses httpd version vulnerable to https://github.com/apache/httpd/pull/281 bug, where clients using HTTP/2 connections can hit no response from httpd whenever MaxRequestsPerChild is used (and the threshold is just hit).

That is dangerous due to two reasons:
1) Investigating the cause is very tricky, as clients wont get any response *randomly*, and httpd logs do not log anything relevant. Basically enabling httpd debugs is the only option to confirm this.
2) We do recommend using MaxRequestsPerChild both in performance guide (https://access.redhat.com/documentation/en-us/red_hat_satellite/6.13/html/tuning_performance_of_red_hat_satellite/configuring_project_for_performance_performance-tuning#tuning_apache_httpd_child_processes_performance-tuning), as well as in tuning profiles:

# grep maxrequestsperchild /usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/*yaml
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/extra-extra-large.yaml:apache::mod::event::maxrequestsperchild: 4000
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/extra-large.yaml:apache::mod::event::maxrequestsperchild: 4000
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/large.yaml:apache::mod::event::maxrequestsperchild: 4000
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/medium.yaml:apache::mod::event::maxrequestsperchild: 4000
#

So the bug can be hit by any customer using HTTP/2 clients (esp. using some automation that very randomly would fail).



Version-Release number of selected component (if applicable):
Sat6.13
- httpd-2.4.37-56.module+el8.8.0+18758+b3a9c8da.6.x86_64


How reproducible:
100%


Steps to Reproduce:
1. Apply either tuning, or follow the tuning guide directly, to have MaxRequestsPerChild enabled in /etc/httpd/conf.modules.d/event.conf . For the sake of testing, manually decrease the value from 4000 to e.g. 10 or 100 (and restart httpd service)
2. Run random API requests (or even login page requests) using HTTP/2 protocol, like:

while true; do
  cnt=0
  while true; do
    cnt=$((cnt+1))
    if [ $((cnt%1000)) -eq 0 ]; then
      echo "running $cnt-th iteration"
    fi
    if [[ $(curl -o /dev/null -s -k --http2 https://localhost/ -w '%{size_download}') == 0 ]]; then
      echo "no response received in $cnt-th iteration"
      break
    fi
  done
  sleep 1
done

(you can use any URI there, e.g. https://localhost:443/api/v2/status or https://localhost:443/katello/api/v2/organizations/1/ )

The --http2 option is crucial.


Actual results:
2. On average, no response will be received in each MaxRequestsPerChild iteration. Like (for value 100):

no response received in 127-th iteration
no response received in 26-th iteration
no response received in 153-th iteration
no response received in 82-th iteration
no response received in 67-th iteration
no response received in 166-th iteration
no response received in 86-th iteration
no response received in 119-th iteration
no response received in 24-th iteration
no response received in 191-th iteration
no response received in 9-th iteration
no response received in 177-th iteration
no response received in 47-th iteration
no response received in 190-th iteration
no response received in 9-th iteration
no response received in 144-th iteration


Expected results:
The script doesn't print a "no response received" error.


Additional info:

Comment 1 Ewoud Kohl van Wijngaarden 2023-09-27 11:09:16 UTC

Thanks for uncovering this. I get the impression just raising the MaxRequestsPerChild value isn't solving it. Just making it less frequent. Should we push for RHEL to include the Apache bugfix? In the mean time, we can raise the default value in our installer so it's 4000 everywhere.

Comment 2 Pavel Moravec 2023-09-27 11:14:18 UTC

(In reply to Ewoud Kohl van Wijngaarden from comment #1)
> Thanks for uncovering this. I get the impression just raising the
> MaxRequestsPerChild value isn't solving it. Just making it less frequent.

Indeed, that is my understanding as well.

> Should we push for RHEL to include the Apache bugfix? In the mean time, we
> can raise the default value in our installer so it's 4000 everywhere.

I think pushing for RHEL fix is the right long-term way since the bug is in httpd component - let me know if I shall help with raising that BZ (i.e. preparing a standalone reproducer outside Satellite).

No idea if/what some better short-to-middle term solution exists.

Comment 3 Ewoud Kohl van Wijngaarden 2023-09-27 11:50:02 UTC

Looking deeper at the docs we can see that for Apache there's https://httpd.apache.org/docs/2.2/mod/mpm_common.html#MaxRequestsPerChild but that's Apache 2.2, defaulting to 10000. Actually in 2.4 it was renamed to MaxConnectionsPerChild: https://httpd.apache.org/docs/2.4/mod/mpm_common.html#maxconnectionsperchild (which interprets the old MaxRequestsPerChild setting), defaulting to 0.

So our default tuning doesn't limit it at all and doesn't recycle workers, but large installations do. So why do we even set this today? Back in the day with mod_wsgi and mod_passenger it was possible to have memory leaks in application code so it made sense, but now we're a pure reverse proxy with minimal code. I'd trust Apache to not leak memory and propose we drop the setting, relying on the default.

Small detail: since https://github.com/puppetlabs/puppetlabs-apache/commit/cedd45b63be89ea54bd2a596e6cd3a3f60d4faf8 the parameter doesn't exist anymore and setting apache::mod::event::maxrequestsperchild in Hiera does nothing. So Foreman 3.8 includes puppetlabs-apache >= 9.0 and I hadn't noticed this before.

Can you perform testing with the value set to 0 and see if it still happens?

Comment 4 Ewoud Kohl van Wijngaarden 2023-09-27 11:55:30 UTC

I opened https://projects.theforeman.org/issues/36784 to drop it from the newer versions (since it's pointless). If testing shows it's better to drop the tuning, we should cherry pick it further back.

Comment 5 Pavel Moravec 2023-09-27 17:36:08 UTC

> Can you perform testing with the value set to 0 and see if it still happens?

I run two sets of tests:

1) MaxRequestsPerChild set to zero
2) MaxRequestsPerChild setting even removed (which should imply zero, but let double-check..)

In both cases, I run > 160k iterations / individual curl requests without an issue. So either setting prevents the bug.

Comment 7 Ewoud Kohl van Wijngaarden 2023-10-03 10:33:03 UTC

Thanks for testing. The default value is 0, so both of those test cases should be the same but still good to have confirmation.

The event MPM is default since Foreman 3.3 (https://projects.theforeman.org/issues/20889), so this bug affects users who chose a tuning profile on Satellite 6.12+.

Moving to POST since https://github.com/theforeman/foreman-installer/commit/4462c6d4fc34cfdfe73d31c63cbf39eb979f73e6 was merged.

Comment 11 Griffin Sullivan 2023-10-18 17:08:13 UTC

FailedQA on 6.14 snap 19

The changes are not present in the snap.


# grep maxrequestsperchild /usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/*yaml
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/extra-extra-large.yaml:apache::mod::event::maxrequestsperchild: 4000
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/extra-large.yaml:apache::mod::event::maxrequestsperchild: 4000
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/large.yaml:apache::mod::event::maxrequestsperchild: 4000
/usr/share/foreman-installer/config/foreman.hiera/tuning/sizes/medium.yaml:apache::mod::event::maxrequestsperchild: 4000

Comment 12 Bryan Kearney 2023-10-18 20:02:49 UTC

Upstream bug assigned to ekohlvan

Comment 17 Brad Buckingham 2023-10-30 11:29:29 UTC

Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set.

Comment 19 Griffin Sullivan 2023-12-13 15:54:17 UTC

Verified in 6.15.0 snap 2.1

maxrequestperchild not set in tuning profiles.

Comment 22 errata-xmlrpc 2024-04-23 17:14:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010

Comment 23 Red Hat Bugzilla 2024-08-22 04:25:16 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days