Created attachment 932724 [details] RHSM log files. Description of problem: Satellite 6 and subscription-manager client show inconsistencies when registering many clients in parallel (50, 100, 150, 200) via an activation key. While having 100 clients register at the same time, Satellite 6 will report 100 new content hosts in the UI and in the candlepin database (table cp_consumers) however the output from the clients rhsm shows errors and timeouts. Version-Release number of selected component (if applicable): RHEL 6.5 - 2.6.32-431.23.3 Satellite 6.0.3 (GA-Snap4) candlepin-0.9.19-1.el6_5.noarch katello-1.5.0-28.el6sat.noarch pulp-server-2.4.0-0.30.beta.el6sat.noarch qpid-cpp-server-0.22-42.el6.x86_64 foreman-1.6.0.38-1.el6sat.noarch puppet-server-3.6.2-1.el6sat.noarch elasticsearch-0.90.10-4.el6sat.noarch How reproducible: Consistent to produce the behavior of errors in RHSM log file. The number of clients that actually complete registration is inconsistent. Steps to Reproduce: 1. Spawn 100 clients that have red hat subscription-manager 2. Have all 100 clients attempt to register at one time 3. grep for errors in rhsm.log and/or run subscription-manager status to view status of client 4. view number of registered clients on sat6 Actual results: Satellite 6 shows 100 clients added in both the Web UI and cp_consumers table Depending on the run it can be as low as ~30 clients showing a successful subscription-manager output to as high as all 100. Expected results: On each of the clients: The system has been registered with ID: ..... Additional info: View rhsm logs of various error output in attached log files. tomcat6 has a config of 150 threads in Satellite6 http default timeouts: /etc/httpd/conf/httpd.conf:Timeout 120 /etc/httpd/conf/httpd.conf:KeepAliveTimeout 15 /etc/httpd/conf.d/05-foreman-ssl.conf:PassengerStartTimeout 600 /etc/httpd/conf.d/05-foreman.conf:PassengerStartTimeout 600 /etc/httpd/conf.d/ssl.conf: SSLSessionCacheTimeout 300
Since this issue was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.
Adding email comments so this information is not lost: I've re-run with a raised timeout (600) and a raised thread count (300 instead of 150) and ran a katello-service restart but found no consistent difference. in number of clients showing registered in Sat6 nor in the number of 503s I'm getting in rhsm.log I am seeing a large number of errors logged to foreman-ssl_error_ssl.log [Wed Aug 27 11:18:51 2014] [error] [client 172.16.10.12] (104)Connection reset by peer: ap_content_length_filter: apr_bucket_read() failed
Looking at passenger-status while running the test I see requests in queue shoot up to 100 and stay pegged at that. # passenger-status Version : 4.0.18 Date : Tue Sep 09 10:46:23 -0400 2014 Instance: 12359 ----------- General information ----------- Max pool size : 6 Processes : 2 Requests in top-level queue : 0 ----------- Application groups ----------- /usr/share/foreman#default: App root: /usr/share/foreman (spawning new process...) Requests in queue: 100 * PID: 16407 Sessions: 1 Processed: 402 Uptime: 21m 11s CPU: 2% Memory : 248M Last used: 1s ago /etc/puppet/rack#default: App root: /etc/puppet/rack Requests in queue: 0 * PID: 12715 Sessions: 0 Processed: 5 Uptime: 28m 4s CPU: 0% Memory : 82M Last used: 28m 0s ago
Is there any sort of workaround or other comment I can add to the rel note for this, to help the customer understand what's going on or how to avoid the issue? thanks
Pls see comment #5 thanks
Brad, or anyone... Can you add to this?
At this time, there is no known workaround; however, perhaps we can recommend the following: If a failure is observed while performing a 'subscription-manager register' using an activation key, perform the following: - View /var/log/rhsm/rhsm.log on the client - Look for the error that occurred during registration - If the error is an SSLTimeoutError, request the Satellite 6 administrator to confirm if the client has been registered. This can be confirmed by locating the client on the Hosts -> Content Hosts page.
Reset docs contact <> daobrien
I believe this works given the tunings in the 6.2 performance doc. Marking as closed/currentrelease.