Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1135557 - Registering large number of clients in parallel results in some hosts showing up in Sat6 and subscription-manager reporting errors
Summary: Registering large number of clients in parallel results in some hosts showing...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Registration
Version: 6.0.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Katello QA List
URL:
Whiteboard:
Depends On:
Blocks: sat61-release-notes
TreeView+ depends on / blocked
 
Reported: 2014-08-29 15:37 UTC by Alex Krzos
Modified: 2019-09-26 13:50 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-22 14:33:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
RHSM log files. (7.04 KB, application/octet-stream)
2014-08-29 15:37 UTC, Alex Krzos
no flags Details

Description Alex Krzos 2014-08-29 15:37:00 UTC
Created attachment 932724 [details]
RHSM log files.

Description of problem:
Satellite 6 and subscription-manager client show inconsistencies when registering many clients in parallel (50, 100, 150, 200) via an activation key.

While having 100 clients register at the same time, Satellite 6 will report 100 new content hosts in the UI and in the candlepin database (table cp_consumers) however the output from the clients rhsm shows errors and timeouts.

Version-Release number of selected component (if applicable):
RHEL 6.5 - 2.6.32-431.23.3
Satellite 6.0.3 (GA-Snap4)
candlepin-0.9.19-1.el6_5.noarch
katello-1.5.0-28.el6sat.noarch
pulp-server-2.4.0-0.30.beta.el6sat.noarch
qpid-cpp-server-0.22-42.el6.x86_64
foreman-1.6.0.38-1.el6sat.noarch
puppet-server-3.6.2-1.el6sat.noarch
elasticsearch-0.90.10-4.el6sat.noarch

How reproducible:
Consistent to produce the behavior of errors in RHSM log file.  The number of clients that actually complete registration is inconsistent.

Steps to Reproduce:
1. Spawn 100 clients that have red hat subscription-manager
2. Have all 100 clients attempt to register at one time
3. grep for errors in rhsm.log and/or run subscription-manager status to view status of client
4. view number of registered clients on sat6

Actual results:
Satellite 6 shows 100 clients added in both the Web UI and cp_consumers table
Depending on the run it can be as low as ~30 clients showing a successful subscription-manager output to as high as all 100.

Expected results:
On each of the clients:
The system has been registered with ID: .....

Additional info:

View rhsm logs of various error output in attached log files.

tomcat6 has a config of 150 threads in Satellite6

http default timeouts:
/etc/httpd/conf/httpd.conf:Timeout 120
/etc/httpd/conf/httpd.conf:KeepAliveTimeout 15
/etc/httpd/conf.d/05-foreman-ssl.conf:PassengerStartTimeout 600
/etc/httpd/conf.d/05-foreman.conf:PassengerStartTimeout 600
/etc/httpd/conf.d/ssl.conf:  SSLSessionCacheTimeout 300

Comment 1 RHEL Program Management 2014-08-29 15:52:55 UTC
Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 3 Alex Krzos 2014-09-09 12:56:11 UTC
Adding email comments so this information is not lost:

I've re-run with a raised timeout (600) and a raised thread count (300 instead of 150) and ran a katello-service restart but found no consistent difference. in number of clients showing registered in Sat6 nor in the number of 503s I'm getting in rhsm.log

I am seeing a large number of errors logged to foreman-ssl_error_ssl.log

[Wed Aug 27 11:18:51 2014] [error] [client 172.16.10.12] (104)Connection reset by peer: ap_content_length_filter: apr_bucket_read() failed

Comment 4 Alex Krzos 2014-09-09 14:47:45 UTC
Looking at passenger-status while running the test I see requests in queue shoot up to 100 and stay pegged at that.

# passenger-status
Version : 4.0.18
Date    : Tue Sep 09 10:46:23 -0400 2014
Instance: 12359
----------- General information -----------
Max pool size : 6
Processes     : 2
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  (spawning new process...)
  Requests in queue: 100
  * PID: 16407   Sessions: 1       Processed: 402     Uptime: 21m 11s
    CPU: 2%      Memory  : 248M    Last used: 1s ago

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 12715   Sessions: 0       Processed: 5       Uptime: 28m 4s
    CPU: 0%      Memory  : 82M     Last used: 28m 0s ago

Comment 5 David O'Brien 2015-08-06 03:23:14 UTC
Is there any sort of workaround or other comment I can add to the rel note for this, to help the customer understand what's going on or how to avoid the issue?

thanks

Comment 7 David O'Brien 2015-08-19 02:37:48 UTC
Pls see comment #5

thanks

Comment 8 David O'Brien 2015-08-25 23:15:39 UTC
Brad, or anyone...

Can you add to this?

Comment 9 Brad Buckingham 2015-08-26 13:52:48 UTC
At this time, there is no known workaround; however, perhaps we can recommend the following:

If a failure is observed while performing a 'subscription-manager register' using an activation key, perform the following:
- View /var/log/rhsm/rhsm.log on the client
- Look for the error that occurred during registration
- If the error is an SSLTimeoutError, request the Satellite 6 administrator to confirm if the client has been registered.  This can be confirmed by locating the client on the Hosts -> Content Hosts page.

Comment 12 David O'Brien 2016-04-18 00:48:59 UTC
Reset docs contact <> daobrien

Comment 16 Chris Duryee 2016-09-22 14:33:48 UTC
I believe this works given the tunings in the 6.2 performance doc. Marking as closed/currentrelease.


Note You need to log in before you can comment on or make changes to this bug.