Created attachment 932724 [details]
RHSM log files.
Description of problem:
Satellite 6 and subscription-manager client show inconsistencies when registering many clients in parallel (50, 100, 150, 200) via an activation key.
While having 100 clients register at the same time, Satellite 6 will report 100 new content hosts in the UI and in the candlepin database (table cp_consumers) however the output from the clients rhsm shows errors and timeouts.
Version-Release number of selected component (if applicable):
RHEL 6.5 - 2.6.32-431.23.3
Satellite 6.0.3 (GA-Snap4)
Consistent to produce the behavior of errors in RHSM log file. The number of clients that actually complete registration is inconsistent.
Steps to Reproduce:
1. Spawn 100 clients that have red hat subscription-manager
2. Have all 100 clients attempt to register at one time
3. grep for errors in rhsm.log and/or run subscription-manager status to view status of client
4. view number of registered clients on sat6
Satellite 6 shows 100 clients added in both the Web UI and cp_consumers table
Depending on the run it can be as low as ~30 clients showing a successful subscription-manager output to as high as all 100.
On each of the clients:
The system has been registered with ID: .....
View rhsm logs of various error output in attached log files.
tomcat6 has a config of 150 threads in Satellite6
http default timeouts:
/etc/httpd/conf.d/ssl.conf: SSLSessionCacheTimeout 300
Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.
Adding email comments so this information is not lost:
I've re-run with a raised timeout (600) and a raised thread count (300 instead of 150) and ran a katello-service restart but found no consistent difference. in number of clients showing registered in Sat6 nor in the number of 503s I'm getting in rhsm.log
I am seeing a large number of errors logged to foreman-ssl_error_ssl.log
[Wed Aug 27 11:18:51 2014] [error] [client 172.16.10.12] (104)Connection reset by peer: ap_content_length_filter: apr_bucket_read() failed
Looking at passenger-status while running the test I see requests in queue shoot up to 100 and stay pegged at that.
Version : 4.0.18
Date : Tue Sep 09 10:46:23 -0400 2014
----------- General information -----------
Max pool size : 6
Processes : 2
Requests in top-level queue : 0
----------- Application groups -----------
App root: /usr/share/foreman
(spawning new process...)
Requests in queue: 100
* PID: 16407 Sessions: 1 Processed: 402 Uptime: 21m 11s
CPU: 2% Memory : 248M Last used: 1s ago
App root: /etc/puppet/rack
Requests in queue: 0
* PID: 12715 Sessions: 0 Processed: 5 Uptime: 28m 4s
CPU: 0% Memory : 82M Last used: 28m 0s ago
Is there any sort of workaround or other comment I can add to the rel note for this, to help the customer understand what's going on or how to avoid the issue?
Pls see comment #5
Brad, or anyone...
Can you add to this?
At this time, there is no known workaround; however, perhaps we can recommend the following:
If a failure is observed while performing a 'subscription-manager register' using an activation key, perform the following:
- View /var/log/rhsm/rhsm.log on the client
- Look for the error that occurred during registration
- If the error is an SSLTimeoutError, request the Satellite 6 administrator to confirm if the client has been registered. This can be confirmed by locating the client on the Hosts -> Content Hosts page.
Reset docs contact <> daobrien
I believe this works given the tunings in the 6.2 performance doc. Marking as closed/currentrelease.