Bug 2152709
| Summary: | Concurrent registration of hosts behind a Capsule fail with error 502 | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pablo Mendez Hernandez <pmendezh> |
| Component: | Installation | Assignee: | Ewoud Kohl van Wijngaarden <ekohlvan> |
| Status: | CLOSED ERRATA | QA Contact: | Satellite QE Team <sat-qe-bz-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.13.0 | CC: | ahumbe, egolov, ehelms, ekohlvan, jhutar, rlavi |
| Target Milestone: | 6.15.0 | Keywords: | Patch, Performance, Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | foreman-installer-3.9.0-0 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-23 17:12:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
We can set https://httpd.apache.org/docs/current/mod/mod_proxy.html#proxytimeout via the Hiera key apache::mod::proxy::proxy_timeout. Would this be sufficient for your needs and able to close out the bug? I'd (and a lot of customers I'd say) prefer to tackle every possible timeout individually instead of using such a big hammer, so I'd like to keep it open. Hi Pablo, Is this something that could be captured within the tuning guide for now? Thanks! Hi Brad, I'll make sure to create an issue to include the document in the KCS into the Tuning Guide. Some initial testing of https://github.com/theforeman/puppet-foreman_proxy_content/pull/442 shows that using HTTP/2 in the Capsule -> Satellite communication makes it much more reliable. Upstream bug assigned to ekohlvan Upstream bug assigned to ekohlvan Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36854 has been resolved. Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set. Due to the registrations number improvements in our testing shown in https://github.com/theforeman/puppet-foreman_proxy_content/pull/442#issuecomment-1776993907, I'm setting VERIFIED on the patch. There have been multiple changes through our most recent releases targeted at helping to reduce the occurrence of this issue. The nature of this issue is that even at certain scales, or certain loads that it might be possible to trigger this particular error code. We have decided to close this as done in Satellite 6.15 given the, in part, the breadth of the attached issues. If you encounter this issue on a supported version of Satellite please open a new bug that is specific to your particular issue and that targets the workload you are performing when it occurs. Please include as much detail as possible about what you were trying to do, how many hosts or actions were involved, CPU count, memory, the version of Satellite and details that help to indicate how to reproduce it. This will help us to investigate the issue and not the symptom. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2010 |
Description of problem: When trying to perform a concurrent registration of hosts against a capsule, some of them fail throwing errors 502 (the number increases when increasing the concurrency). Content hosts registration log: ~~~ # # Running registration # This system is currently not registered. All local data removed Updating Subscription Management repositories. Unable to read consumer identity This system is not registered with an entitlement server. You can use subscription-manager to register. No match for argument: katello-ca-consumer* No packages marked for removal. Dependencies resolved. Nothing to do. Complete! Updating Subscription Management repositories. Unable to read consumer identity This system is not registered with an entitlement server. You can use subscription-manager to register. Error: There are no enabled repositories in \"/etc/yum.repos.d\", \"/etc/yum/repos.d\", \"/etc/distro.repos.d\". Remote server error. Please check the connection details, or see /var/log/rhsm/rhsm.log for more information. ~~~ The content host will appear in the 'Content Hosts' GUI in Satellite but won't appear as correctly registered: ~~~ # subscription-manager identity This system is not yet registered. Try 'subscription-manager register --help' for more information. ~~~ Its /var/log/rhsm/rhsm.log will show the following: ~~~ ... 2022-12-12 17:31:21,741 [ERROR] subscription-manager:410:MainThread @connection.py:847 - Response: 502 2022-12-12 17:31:21,742 [ERROR] subscription-manager:410:MainThread @connection.py:848 - JSON parsing error: Expecting value: line 1 column 1 (char 0) 2022-12-12 17:31:21,742 [ERROR] subscription-manager:410:MainThread @managercli.py:229 - Error during registration: Server error attempting a POST to /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey returned status 502 2022-12-12 17:31:21,742 [ERROR] subscription-manager:410:MainThread @managercli.py:230 - Server error attempting a POST to /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey returned status 502 Traceback (most recent call last): File "/usr/lib64/python3.6/site-packages/subscription_manager/managercli.py", line 1992, in _do_command service_level=self.options.service_level, File "/usr/lib64/python3.6/site-packages/rhsmlib/services/register.py", line 111, in register jwt_token=jwt_token File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 1154, in registerConsumer return self.conn.request_post(url, params, headers=headers) File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 937, in request_post return self._request("POST", method, params, headers=headers) File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 967, in _request info=info, headers=headers, cert_key_pairs=cert_key_pairs) File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 819, in _request self.validateResponse(result, request_type, handler) File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 899, in validateResponse handler=handler) rhsm.connection.RemoteServerException: Server error attempting a POST to /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey returned status 502 ~~~ And the capsule against whom it tries to register will show this: ~~~ # grep -r -w -e 172.21.178.201 -e containerhost-5-container201 /var/log/httpd/rhsm-pulpcore-https-* /var/log/httpd/rhsm-pulpcore-https-443_access_ssl.log:172.21.178.201 - - [12/Dec/2022:17:30:18 +0000] "GET /rhsm/ HTTP/1.1" 200 2344 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.32-1.el8" /var/log/httpd/rhsm-pulpcore-https-443_access_ssl.log:172.21.178.201 - - [12/Dec/2022:17:30:20 +0000] "POST /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey HTTP/1.1" 502 341 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.32-1.el8" /var/log/httpd/rhsm-pulpcore-https-443_error_ssl.log:[Mon Dec 12 17:31:21.739577 2022] [proxy_http:error] [pid 55482:tid 140662162200320] (70007)The timeout specified has expired: [client 172.21.178.201:55864] AH01102: error reading status line from remote server satellite.blue.ddns.perf.redhat.com:443 /var/log/httpd/rhsm-pulpcore-https-443_error_ssl.log:[Mon Dec 12 17:31:21.739654 2022] [proxy:error] [pid 55482:tid 140662162200320] [client 172.21.178.201:55864] AH00898: Error reading from remote server returned by /rhsm/consumers ~~~ Version-Release number of selected component (if applicable): Satellite 6.12 and 6.13 How reproducible: Pretty reliably under concurrent registrations tests. Steps to Reproduce: 1. Run concurrent registrations against a Satellite capsule server. 2. 3. Actual results: Content hosts is not registered. Expected results: Content host is registered. Additional info: Running this diff has solved the issue for us (increasing the default timeout to 10 minutes): # diff -u /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf.orig /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf --- /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf.orig 2022-12-12 15:17:12.607759842 +0000 +++ /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf 2022-12-12 18:49:26.549730320 +0000 @@ -96,7 +96,7 @@ ## Proxy rules ProxyRequests Off ProxyPreserveHost Off - ProxyPass /rhsm https://satellite.blue.ddns.perf.redhat.com/rhsm disablereuse=on retry=0 + ProxyPass /rhsm https://satellite.blue.ddns.perf.redhat.com/rhsm disablereuse=on retry=0 timeout=600 ProxyPassReverse /rhsm https://satellite.blue.ddns.perf.redhat.com/rhsm ProxyPass /redhat_access https://satellite.blue.ddns.perf.redhat.com/redhat_access disablereuse=on retry=0 ProxyPassReverse /redhat_access https://satellite.blue.ddns.perf.redhat.com/redhat_access This was possible to set in custom-hiera.yaml in the past as documented in https://access.redhat.com/solutions/3412321#comment-2326081, but it seems is not possible anymore, so I think that applying the diff should be OK as a good default.