1135557 – Registering large number of clients in parallel results in some hosts showing up in Sat6 and subscription-manager reporting errors

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1135557 - Registering large number of clients in parallel results in some hosts showing up in Sat6 and subscription-manager reporting errors

Summary: Registering large number of clients in parallel results in some hosts showing...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Registration
Sub Component:
Version:	6.0.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	Unspecified
Assignee:	satellite6-bugs
QA Contact:	Katello QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	sat61-release-notes
TreeView+	depends on / blocked

Reported:	2014-08-29 15:37 UTC by Alex Krzos
Modified:	2019-09-26 13:50 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-22 14:33:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
RHSM log files. (7.04 KB, application/octet-stream) 2014-08-29 15:37 UTC, Alex Krzos	no flags	Details
View All

Description Alex Krzos 2014-08-29 15:37:00 UTC

Created attachment 932724 [details]
RHSM log files.

Description of problem:
Satellite 6 and subscription-manager client show inconsistencies when registering many clients in parallel (50, 100, 150, 200) via an activation key.

While having 100 clients register at the same time, Satellite 6 will report 100 new content hosts in the UI and in the candlepin database (table cp_consumers) however the output from the clients rhsm shows errors and timeouts.

Version-Release number of selected component (if applicable):
RHEL 6.5 - 2.6.32-431.23.3
Satellite 6.0.3 (GA-Snap4)
candlepin-0.9.19-1.el6_5.noarch
katello-1.5.0-28.el6sat.noarch
pulp-server-2.4.0-0.30.beta.el6sat.noarch
qpid-cpp-server-0.22-42.el6.x86_64
foreman-1.6.0.38-1.el6sat.noarch
puppet-server-3.6.2-1.el6sat.noarch
elasticsearch-0.90.10-4.el6sat.noarch

How reproducible:
Consistent to produce the behavior of errors in RHSM log file.  The number of clients that actually complete registration is inconsistent.

Steps to Reproduce:
1. Spawn 100 clients that have red hat subscription-manager
2. Have all 100 clients attempt to register at one time
3. grep for errors in rhsm.log and/or run subscription-manager status to view status of client
4. view number of registered clients on sat6

Actual results:
Satellite 6 shows 100 clients added in both the Web UI and cp_consumers table
Depending on the run it can be as low as ~30 clients showing a successful subscription-manager output to as high as all 100.

Expected results:
On each of the clients:
The system has been registered with ID: .....

Additional info:

View rhsm logs of various error output in attached log files.

tomcat6 has a config of 150 threads in Satellite6

http default timeouts:
/etc/httpd/conf/httpd.conf:Timeout 120
/etc/httpd/conf/httpd.conf:KeepAliveTimeout 15
/etc/httpd/conf.d/05-foreman-ssl.conf:PassengerStartTimeout 600
/etc/httpd/conf.d/05-foreman.conf:PassengerStartTimeout 600
/etc/httpd/conf.d/ssl.conf:  SSLSessionCacheTimeout 300

Comment 1 RHEL Program Management 2014-08-29 15:52:55 UTC

Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 3 Alex Krzos 2014-09-09 12:56:11 UTC

Adding email comments so this information is not lost:

I've re-run with a raised timeout (600) and a raised thread count (300 instead of 150) and ran a katello-service restart but found no consistent difference. in number of clients showing registered in Sat6 nor in the number of 503s I'm getting in rhsm.log

I am seeing a large number of errors logged to foreman-ssl_error_ssl.log

[Wed Aug 27 11:18:51 2014] [error] [client 172.16.10.12] (104)Connection reset by peer: ap_content_length_filter: apr_bucket_read() failed

Comment 4 Alex Krzos 2014-09-09 14:47:45 UTC

Looking at passenger-status while running the test I see requests in queue shoot up to 100 and stay pegged at that.

# passenger-status
Version : 4.0.18
Date    : Tue Sep 09 10:46:23 -0400 2014
Instance: 12359
----------- General information -----------
Max pool size : 6
Processes     : 2
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  (spawning new process...)
  Requests in queue: 100
  * PID: 16407   Sessions: 1       Processed: 402     Uptime: 21m 11s
    CPU: 2%      Memory  : 248M    Last used: 1s ago

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 12715   Sessions: 0       Processed: 5       Uptime: 28m 4s
    CPU: 0%      Memory  : 82M     Last used: 28m 0s ago

Comment 5 David O'Brien 2015-08-06 03:23:14 UTC

Is there any sort of workaround or other comment I can add to the rel note for this, to help the customer understand what's going on or how to avoid the issue?

thanks

Comment 7 David O'Brien 2015-08-19 02:37:48 UTC

Pls see comment #5

thanks

Comment 8 David O'Brien 2015-08-25 23:15:39 UTC

Brad, or anyone...

Can you add to this?

Comment 9 Brad Buckingham 2015-08-26 13:52:48 UTC

At this time, there is no known workaround; however, perhaps we can recommend the following:

If a failure is observed while performing a 'subscription-manager register' using an activation key, perform the following:
- View /var/log/rhsm/rhsm.log on the client
- Look for the error that occurred during registration
- If the error is an SSLTimeoutError, request the Satellite 6 administrator to confirm if the client has been registered.  This can be confirmed by locating the client on the Hosts -> Content Hosts page.

Comment 12 David O'Brien 2016-04-18 00:48:59 UTC

Reset docs contact <> daobrien

Comment 16 Chris Duryee 2016-09-22 14:33:48 UTC

I believe this works given the tunings in the 6.2 performance doc. Marking as closed/currentrelease.

Note You need to log in before you can comment on or make changes to this bug.