Bug 1135557 - Registering large number of clients in parallel results in some hosts showing up in Sat6 and subscription-manager reporting errors
Summary: Registering large number of clients in parallel results in some hosts showing...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Registration
Version: 6.0.3
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Katello QA List
URL:
Whiteboard:
Depends On:
Blocks: sat61-release-notes
TreeView+ depends on / blocked
 
Reported: 2014-08-29 15:37 UTC by Alex Krzos
Modified: 2016-09-22 14:33 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-22 14:33:48 UTC


Attachments (Terms of Use)
RHSM log files. (7.04 KB, application/octet-stream)
2014-08-29 15:37 UTC, Alex Krzos
no flags Details

Description Alex Krzos 2014-08-29 15:37:00 UTC
Created attachment 932724 [details]
RHSM log files.

Description of problem:
Satellite 6 and subscription-manager client show inconsistencies when registering many clients in parallel (50, 100, 150, 200) via an activation key.

While having 100 clients register at the same time, Satellite 6 will report 100 new content hosts in the UI and in the candlepin database (table cp_consumers) however the output from the clients rhsm shows errors and timeouts.

Version-Release number of selected component (if applicable):
RHEL 6.5 - 2.6.32-431.23.3
Satellite 6.0.3 (GA-Snap4)
candlepin-0.9.19-1.el6_5.noarch
katello-1.5.0-28.el6sat.noarch
pulp-server-2.4.0-0.30.beta.el6sat.noarch
qpid-cpp-server-0.22-42.el6.x86_64
foreman-1.6.0.38-1.el6sat.noarch
puppet-server-3.6.2-1.el6sat.noarch
elasticsearch-0.90.10-4.el6sat.noarch

How reproducible:
Consistent to produce the behavior of errors in RHSM log file.  The number of clients that actually complete registration is inconsistent.

Steps to Reproduce:
1. Spawn 100 clients that have red hat subscription-manager
2. Have all 100 clients attempt to register at one time
3. grep for errors in rhsm.log and/or run subscription-manager status to view status of client
4. view number of registered clients on sat6

Actual results:
Satellite 6 shows 100 clients added in both the Web UI and cp_consumers table
Depending on the run it can be as low as ~30 clients showing a successful subscription-manager output to as high as all 100.

Expected results:
On each of the clients:
The system has been registered with ID: .....

Additional info:

View rhsm logs of various error output in attached log files.

tomcat6 has a config of 150 threads in Satellite6

http default timeouts:
/etc/httpd/conf/httpd.conf:Timeout 120
/etc/httpd/conf/httpd.conf:KeepAliveTimeout 15
/etc/httpd/conf.d/05-foreman-ssl.conf:PassengerStartTimeout 600
/etc/httpd/conf.d/05-foreman.conf:PassengerStartTimeout 600
/etc/httpd/conf.d/ssl.conf:  SSLSessionCacheTimeout 300

Comment 1 RHEL Product and Program Management 2014-08-29 15:52:55 UTC
Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 3 Alex Krzos 2014-09-09 12:56:11 UTC
Adding email comments so this information is not lost:

I've re-run with a raised timeout (600) and a raised thread count (300 instead of 150) and ran a katello-service restart but found no consistent difference. in number of clients showing registered in Sat6 nor in the number of 503s I'm getting in rhsm.log

I am seeing a large number of errors logged to foreman-ssl_error_ssl.log

[Wed Aug 27 11:18:51 2014] [error] [client 172.16.10.12] (104)Connection reset by peer: ap_content_length_filter: apr_bucket_read() failed

Comment 4 Alex Krzos 2014-09-09 14:47:45 UTC
Looking at passenger-status while running the test I see requests in queue shoot up to 100 and stay pegged at that.

# passenger-status
Version : 4.0.18
Date    : Tue Sep 09 10:46:23 -0400 2014
Instance: 12359
----------- General information -----------
Max pool size : 6
Processes     : 2
Requests in top-level queue : 0

----------- Application groups -----------
/usr/share/foreman#default:
  App root: /usr/share/foreman
  (spawning new process...)
  Requests in queue: 100
  * PID: 16407   Sessions: 1       Processed: 402     Uptime: 21m 11s
    CPU: 2%      Memory  : 248M    Last used: 1s ago

/etc/puppet/rack#default:
  App root: /etc/puppet/rack
  Requests in queue: 0
  * PID: 12715   Sessions: 0       Processed: 5       Uptime: 28m 4s
    CPU: 0%      Memory  : 82M     Last used: 28m 0s ago

Comment 5 David O'Brien 2015-08-06 03:23:14 UTC
Is there any sort of workaround or other comment I can add to the rel note for this, to help the customer understand what's going on or how to avoid the issue?

thanks

Comment 7 David O'Brien 2015-08-19 02:37:48 UTC
Pls see comment #5

thanks

Comment 8 David O'Brien 2015-08-25 23:15:39 UTC
Brad, or anyone...

Can you add to this?

Comment 9 Brad Buckingham 2015-08-26 13:52:48 UTC
At this time, there is no known workaround; however, perhaps we can recommend the following:

If a failure is observed while performing a 'subscription-manager register' using an activation key, perform the following:
- View /var/log/rhsm/rhsm.log on the client
- Look for the error that occurred during registration
- If the error is an SSLTimeoutError, request the Satellite 6 administrator to confirm if the client has been registered.  This can be confirmed by locating the client on the Hosts -> Content Hosts page.

Comment 12 David O'Brien 2016-04-18 00:48:59 UTC
Reset docs contact <> daobrien

Comment 16 Chris Duryee 2016-09-22 14:33:48 UTC
I believe this works given the tunings in the 6.2 performance doc. Marking as closed/currentrelease.


Note You need to log in before you can comment on or make changes to this bug.