Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2152709 - Concurrent registration of hosts behind a Capsule fail with error 502
Summary: Concurrent registration of hosts behind a Capsule fail with error 502
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Installation
Version: 6.13.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 6.15.0
Assignee: Ewoud Kohl van Wijngaarden
QA Contact: Satellite QE Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-12 18:54 UTC by Pablo Mendez Hernandez
Modified: 2024-04-23 17:12 UTC (History)
6 users (show)

Fixed In Version: foreman-installer-3.9.0-0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-04-23 17:12:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 36854 0 Normal Ready For Testing Use HTTP/2 on content proxies to connect to Foreman 2023-10-23 13:07:06 UTC
Red Hat Issue Tracker SAT-14859 0 None None None 2023-01-11 10:02:25 UTC
Red Hat Issue Tracker SAT-16051 0 None None None 2023-02-20 16:02:05 UTC
Red Hat Product Errata RHSA-2024:2010 0 None None None 2024-04-23 17:12:55 UTC

Description Pablo Mendez Hernandez 2022-12-12 18:54:55 UTC
Description of problem:

When trying to perform a concurrent registration of hosts against a capsule, some of them fail throwing errors 502 (the number increases when increasing the concurrency).

Content hosts registration log:

~~~
#
# Running registration
#
This system is currently not registered.
All local data removed
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered with an entitlement server. You can use subscription-manager to register.

No match for argument: katello-ca-consumer*
No packages marked for removal.
Dependencies resolved.
Nothing to do.
Complete!
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered with an entitlement server. You can use subscription-manager to register.

Error: There are no enabled repositories in \"/etc/yum.repos.d\", \"/etc/yum/repos.d\", \"/etc/distro.repos.d\".
Remote server error. Please check the connection details, or see /var/log/rhsm/rhsm.log for more information.
~~~

The content host will appear in the 'Content Hosts' GUI in Satellite but won't appear as correctly registered:

~~~
# subscription-manager identity
This system is not yet registered. Try 'subscription-manager register --help' for more information.
~~~

Its /var/log/rhsm/rhsm.log will show the following:

~~~
...
2022-12-12 17:31:21,741 [ERROR] subscription-manager:410:MainThread @connection.py:847 - Response: 502
2022-12-12 17:31:21,742 [ERROR] subscription-manager:410:MainThread @connection.py:848 - JSON parsing error: Expecting value: line 1 column 1 (char 0)
2022-12-12 17:31:21,742 [ERROR] subscription-manager:410:MainThread @managercli.py:229 - Error during registration: Server error attempting a POST to /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey returned status 502
2022-12-12 17:31:21,742 [ERROR] subscription-manager:410:MainThread @managercli.py:230 - Server error attempting a POST to /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey returned status 502
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/subscription_manager/managercli.py", line 1992, in _do_command
    service_level=self.options.service_level,
  File "/usr/lib64/python3.6/site-packages/rhsmlib/services/register.py", line 111, in register
    jwt_token=jwt_token
  File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 1154, in registerConsumer
    return self.conn.request_post(url, params, headers=headers)
  File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 937, in request_post
    return self._request("POST", method, params, headers=headers)
  File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 967, in _request
    info=info, headers=headers, cert_key_pairs=cert_key_pairs)
  File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 819, in _request
    self.validateResponse(result, request_type, handler)
  File "/usr/lib64/python3.6/site-packages/rhsm/connection.py", line 899, in validateResponse
    handler=handler)
rhsm.connection.RemoteServerException: Server error attempting a POST to /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey returned status 502
~~~

And the capsule against whom it tries to register will show this:

~~~
# grep -r -w -e 172.21.178.201 -e containerhost-5-container201 /var/log/httpd/rhsm-pulpcore-https-*
/var/log/httpd/rhsm-pulpcore-https-443_access_ssl.log:172.21.178.201 - - [12/Dec/2022:17:30:18 +0000] "GET /rhsm/ HTTP/1.1" 200 2344 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.32-1.el8"
/var/log/httpd/rhsm-pulpcore-https-443_access_ssl.log:172.21.178.201 - - [12/Dec/2022:17:30:20 +0000] "POST /rhsm/consumers?owner=Default_Organization&activation_keys=ActivationKey HTTP/1.1" 502 341 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.32-1.el8"
/var/log/httpd/rhsm-pulpcore-https-443_error_ssl.log:[Mon Dec 12 17:31:21.739577 2022] [proxy_http:error] [pid 55482:tid 140662162200320] (70007)The timeout specified has expired: [client 172.21.178.201:55864] AH01102: error reading status line from remote server satellite.blue.ddns.perf.redhat.com:443
/var/log/httpd/rhsm-pulpcore-https-443_error_ssl.log:[Mon Dec 12 17:31:21.739654 2022] [proxy:error] [pid 55482:tid 140662162200320] [client 172.21.178.201:55864] AH00898: Error reading from remote server returned by /rhsm/consumers
~~~


Version-Release number of selected component (if applicable):

Satellite 6.12 and 6.13


How reproducible:

Pretty reliably under concurrent registrations tests.


Steps to Reproduce:
1. Run concurrent registrations against a Satellite capsule server.
2.
3.


Actual results:

Content hosts is not registered.


Expected results:

Content host is registered.


Additional info:

Running this diff has solved the issue for us (increasing the default timeout to 10 minutes):

# diff -u /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf.orig /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf
--- /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf.orig	2022-12-12 15:17:12.607759842 +0000
+++ /etc/httpd/conf.d/10-rhsm-pulpcore-https-443.conf	2022-12-12 18:49:26.549730320 +0000
@@ -96,7 +96,7 @@
   ## Proxy rules
   ProxyRequests Off
   ProxyPreserveHost Off
-  ProxyPass /rhsm https://satellite.blue.ddns.perf.redhat.com/rhsm disablereuse=on retry=0
+  ProxyPass /rhsm https://satellite.blue.ddns.perf.redhat.com/rhsm disablereuse=on retry=0 timeout=600
   ProxyPassReverse /rhsm https://satellite.blue.ddns.perf.redhat.com/rhsm
   ProxyPass /redhat_access https://satellite.blue.ddns.perf.redhat.com/redhat_access disablereuse=on retry=0
   ProxyPassReverse /redhat_access https://satellite.blue.ddns.perf.redhat.com/redhat_access


This was possible to set in custom-hiera.yaml in the past as documented in https://access.redhat.com/solutions/3412321#comment-2326081, but it seems is not possible anymore, so I think that applying the diff should be OK as a good default.

Comment 3 Ewoud Kohl van Wijngaarden 2023-01-30 13:19:36 UTC
We can set https://httpd.apache.org/docs/current/mod/mod_proxy.html#proxytimeout via the Hiera key apache::mod::proxy::proxy_timeout. Would this be sufficient for your needs and able to close out the bug?

Comment 4 Pablo Mendez Hernandez 2023-02-02 14:47:06 UTC
I'd (and a lot of customers I'd say) prefer to tackle every possible timeout individually
instead of using such a big hammer, so I'd like to keep it open.

Comment 5 Brad Buckingham 2023-02-16 16:40:34 UTC
Hi Pablo,

Is this something that could be captured within the tuning guide for now? 

Thanks!

Comment 7 Pablo Mendez Hernandez 2023-02-20 15:48:49 UTC
Hi Brad,

I'll make sure to create an issue to include the document in the KCS into the Tuning Guide.

Comment 12 Ewoud Kohl van Wijngaarden 2023-10-23 13:07:07 UTC
Some initial testing of https://github.com/theforeman/puppet-foreman_proxy_content/pull/442 shows that using HTTP/2 in the Capsule -> Satellite communication makes it much more reliable.

Comment 13 Bryan Kearney 2023-10-23 16:02:11 UTC
Upstream bug assigned to ekohlvan

Comment 14 Bryan Kearney 2023-10-23 16:02:13 UTC
Upstream bug assigned to ekohlvan

Comment 15 Bryan Kearney 2023-10-24 16:02:14 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36854 has been resolved.

Comment 16 Brad Buckingham 2023-10-30 11:29:29 UTC
Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set.

Comment 17 Pablo Mendez Hernandez 2024-01-08 14:53:38 UTC
Due to the registrations number improvements in our testing shown in https://github.com/theforeman/puppet-foreman_proxy_content/pull/442#issuecomment-1776993907, I'm setting VERIFIED on the patch.

Comment 19 Eric Helms 2024-04-09 12:22:34 UTC
There have been multiple changes through our most recent releases targeted at helping to reduce the occurrence of this issue. The nature of this issue is that even at certain scales, or certain loads that it might be possible to trigger this particular error code. We have decided to close this as done in Satellite 6.15 given the, in part, the breadth of the attached issues. 

If you encounter this issue on a supported version of Satellite please open a new bug that is specific to your particular issue and that targets the workload you are performing when it occurs. Please include as much detail as possible about what you were trying to do, how many hosts or actions were involved, CPU count, memory, the version of Satellite and details that help to indicate how to reproduce it. This will help us to investigate the issue and not the symptom.

Comment 21 errata-xmlrpc 2024-04-23 17:12:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010


Note You need to log in before you can comment on or make changes to this bug.