Description of problem: As the number of subscribed systems increases, the time to register/subscribe a new system increases dramatically. The increase from an unloaded system (0 systems) to one with 1000 systems, was 10 fold. The tests were run with a timeout of 120 seconds. When the system did time out, it appeared to be processing something in Candlepin (since the API call that was made is just proxied to Candlepin). The test was done using the same API calls that a subscription-manager client would when binding to a specify product id. Version-Release number of selected component (if applicable): katello-1.1.12-14.el6cf.noarch katello-certs-tools-1.1.8-1.el6cf.noarch katello-qpid-broker-key-pair-1.0-1.noarch katello-selinux-1.1.1-1.el6cf.noarch katello-candlepin-cert-key-pair-1.0-1.noarch katello-glue-pulp-1.1.12-14.el6cf.noarch katello-all-1.1.12-14.el6cf.noarch katello-cli-common-1.1.8-7.el6cf.noarch katello-common-1.1.12-14.el6cf.noarch katello-qpid-client-key-pair-1.0-1.noarch katello-cli-1.1.8-7.el6cf.noarch katello-glue-candlepin-1.1.12-14.el6cf.noarch katello-configure-1.1.9-7.el6cf.noarch How reproducible: Consistent Steps to Reproduce: (API calls) 1. GET /organizations/ACME_Corporation 2. POST /consumers/ (with facts) 3. PUT /consumers/:uuid/packages 4. POST /consumers/:uuid/entitlements (with product param) Actual results: Test system is 2 socket, 4 core, 2 threads per core, 24G RAM Step 4 yielded 3 seconds on an unloaded system, 30+ sec on system with 1000 existing systems
What type of manifest did you have? 1 subscription with a quantity of 1000+, or lots of little?
That may be where it turns a little strange. The manifest I initially loaded only had a couple of subscriptions (3 at most). Now, there are 999 (or maybe more). While I think there is still some validity to this, I may need to run these tests again with a known good manifest.
If the subscriptions in the manifest are using virt_limit there would be one pool created for each system bind (which is for 4 guests on that specific host). This could explain the explosion in pools. They should not however be affecting performance in recent versions of candlepin. (should be fixed since around candlepin-0.7.9-1)
The version of candlepin in this case is candlepin-0.7.8.1-1.el6cf.noarch. I can re-run the tests with a later candlepin to see how it fares.
Upstream has been updated to candlepin-0.7.19. This contains the fix for this issue. When candlepin-0.7.19 is moved to a CFSE branch, this should be moved to ON_QA.
Moving all POST bugs to ON_QA since we have delivered a puddle with the bugs.
VERIFIED. Og Maciel registered and subscribed 1000 clients with no errors or no delay. Packages tested: * candlepin-0.8.19-1.el6sam.noarch * candlepin-scl-1-5.el6_4.noarch * candlepin-scl-quartz-2.1.5-5.el6_4.noarch * candlepin-scl-rhino-1.7R3-1.el6_4.noarch * candlepin-scl-runtime-1-5.el6_4.noarch * candlepin-selinux-0.8.19-1.el6sam.noarch * candlepin-tomcat6-0.8.19-1.el6sam.noarch * elasticsearch-0.19.9-8.el6sat.noarch * katello-candlepin-cert-key-pair-1.0-1.noarch * katello-certs-tools-1.4.2-2.el6sat.noarch * katello-cli-1.4.3-5.el6sat.noarch * katello-cli-common-1.4.3-5.el6sat.noarch * katello-common-1.4.3-6.el6sam_splice.noarch * katello-configure-1.4.4-2.el6sat.noarch * katello-glue-candlepin-1.4.3-6.el6sam_splice.noarch * katello-glue-elasticsearch-1.4.3-6.el6sam_splice.noarch * katello-headpin-1.4.3-6.el6sam_splice.noarch * katello-headpin-all-1.4.3-6.el6sam_splice.noarch * katello-selinux-1.4.4-2.el6sat.noarch * thumbslug-0.0.32-1.el6sam.noarch * thumbslug-selinux-0.0.32-1.el6sam.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1390.html