Description of problem: 150 + users logged into, and deploying instances from, one conductor instance results in large delays in accessing pages (up to 7 or 8 minutes) and "Response message: Service Temporarily Unavailable" messages. Is there a documented limit to how many users can/should be logged into one conductor instance concurrently? Steps to Reproduce: 1. Add 200 users to the aeolus database 2. Create a mock provider account in conductor 3. Build and push one image to the mock provider 3. (Using jmeter) 170 users concurrently log in, access the image, create a blueprint and launch that image (with hardware profile) to the mock provider 4. See that some instances (99) launch successfully 5. Others instances fail to launch 6. All through the test there are "Response message: Service Temporarily Unavailable" messages when users are accessing:conductor/pools, conductor/deployments, conductor/deployables, conductor/images and conductor/users pages. The number of these messages varies but increases from 25 "Service Temporarily Unavailable" messages in conductor/users to over 40 in conductor/pools at the end of the test. See attachment with test results output. Versions tested: rpm -qa |grep aeolus rubygem-aeolus-cli-0.3.0-14.el6.noarch aeolus-configure-2.5.0-18.el6.noarch aeolus-all-0.8.0-41.el6.noarch rubygem-aeolus-image-0.3.0-12.el6.noarch aeolus-conductor-0.8.0-41.el6.noarch aeolus-conductor-daemons-0.8.0-41.el6.noarch aeolus-conductor-doc-0.8.0-41.el6.noarch
Created attachment 569514 [details] Test Results Table
Just to be sure, can you tell me which jmeter test you used? Was it aeolus-performance-testing/jmeter/create-deployment-and-launch?
I used a mix of tests and scripts: - To add the 200 users, I used https://github.com/aeolusproject/aeolus-performance-testing/blob/master/jmeter/scripts/configure-and-create-users.sh - To build and push the 1 image to mock, I used: https://github.com/aeolusproject/aeolus-performance-testing/blob/master/jmeter/build-and-push/fedora15.jmx (only one thread needed) - To log in the 170 users and deploy the instances concurrently, I used the attached .jmx testplan. Need to place all these jmeter testplans on github but in the mean time, I'm attaching it to the BZ.
Created attachment 570697 [details] jmeter testplan
Patch created: https://fedorahosted.org/pipermail/aeolus-devel/2012-March/009670.html I don't claim it fixes everything - a comprehensive review feels out of scope pre-1.1, and would probably involve things like revisiting permissions and caching - but it does fix some pretty big issues.
Patches pushed to master: commit d951bad80e60b4aaee3db859210f9e2a8601f571 BZ 802571 refactor provider_account sort-by-priority to correctly sort nils, regardless of db commit 9f2ee9f1cf28be988b02ab3d650d919856fff70d BZ 802571 don't use deployment.as_json unnecessarily commit 7b3f91e5d8bdbc2e9db0ec6d051fad9a1f647873 BZ 802571 don't query provider multiple times commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87 BZ 802571 added eager loading and other minor efficiency fixes
Added fix (https://fedorahosted.org/pipermail/aeolus-devel/2012-March/009742.html), which brings the total number of patches needed to five: commit d951bad80e60b4aaee3db859210f9e2a8601f571 BZ 802571 refactor provider_account sort-by-priority to correctly sort nils, regardless of db commit 9f2ee9f1cf28be988b02ab3d650d919856fff70d BZ 802571 don't use deployment.as_json unnecessarily commit 7b3f91e5d8bdbc2e9db0ec6d051fad9a1f647873 BZ 802571 don't query provider multiple times commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87 BZ 802571 added eager loading and other minor efficiency fixes commit 1c35c34898731c016e95bc158d2d9ee977f81235 BZ 802571 fix to previous patch so that list_for_user works
Latest run of these tests resulted in just 22 instances of "Service Temporarily Unavailable", all during a call to /conductor/users. $ grep "Service Temporarily Unavailable" resultsTable200.csv | wc -l 22 This will probably be highly dependent on the system that is hosting Conductor, and how many thin servers are running. I'm ok with closing this for now, as it is a marked improvement over previous runs of this same test script. Verified. aeolus-all-0.13.22-1.el6cf.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-1516.html