Bug 802571 - Scalability Testing: 150 + users logged into, and deploying instances from, one conductor instance results in "Response code: 503" - Response message: Service Temporarily Unavailable
Summary: Scalability Testing: 150 + users logged into, and deploying instances from, ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: CloudForms Cloud Engine
Classification: Retired
Component: aeolus-conductor
Version: 1.0.0
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: rc
Assignee: Tzu-Mainn Chen
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-12 22:37 UTC by Ronelle Landy
Modified: 2012-12-04 14:58 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
During a scale-testing of 150+ users logging in Conductor and deploying instances, some instances returned a response message: "Service Temporarily Unavailable." This bug fix reduces query numbers and adds eager loading so fewer instances will return this error.
Clone Of:
Environment:
Last Closed: 2012-12-04 14:58:35 UTC
Embargoed:


Attachments (Terms of Use)
Test Results Table (238.72 KB, text/csv)
2012-03-12 22:40 UTC, Ronelle Landy
no flags Details
jmeter testplan (102.58 KB, application/octet-stream)
2012-03-16 20:20 UTC, Ronelle Landy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2012:1516 0 normal SHIPPED_LIVE CloudForms Cloud Engine 1.1 update 2012-12-04 19:51:45 UTC

Description Ronelle Landy 2012-03-12 22:37:08 UTC
Description of problem:

150 + users logged into, and deploying instances from, one conductor instance results in large delays in accessing pages (up to 7 or 8 minutes) and "Response message: Service Temporarily Unavailable" messages.

Is there a documented limit to how many users can/should be logged into one conductor instance concurrently?


Steps to Reproduce:
1. Add 200 users to the aeolus database
2. Create a mock provider account in conductor
3. Build and push one image to the mock provider
3. (Using jmeter) 170 users concurrently log in, access the image, create a
blueprint and launch that image (with hardware profile) to the mock provider
4. See that some instances (99) launch successfully 
5. Others instances fail to launch
6. All through the test there are "Response message: Service Temporarily Unavailable" messages when users are accessing:conductor/pools, conductor/deployments, conductor/deployables, conductor/images and conductor/users pages. The number of these messages varies but increases from 25 "Service Temporarily Unavailable" messages in conductor/users to over 40 in conductor/pools at the end of the test.

See attachment with test results output.


Versions tested:

rpm -qa |grep aeolus
rubygem-aeolus-cli-0.3.0-14.el6.noarch
aeolus-configure-2.5.0-18.el6.noarch
aeolus-all-0.8.0-41.el6.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-conductor-0.8.0-41.el6.noarch
aeolus-conductor-daemons-0.8.0-41.el6.noarch
aeolus-conductor-doc-0.8.0-41.el6.noarch

Comment 1 Ronelle Landy 2012-03-12 22:40:17 UTC
Created attachment 569514 [details]
Test Results Table

Comment 2 Tzu-Mainn Chen 2012-03-16 19:20:10 UTC
Just to be sure, can you tell me which jmeter test you used?  Was it aeolus-performance-testing/jmeter/create-deployment-and-launch?

Comment 3 Ronelle Landy 2012-03-16 20:16:50 UTC
I used a mix of tests and scripts:

 - To add the 200 users, I used https://github.com/aeolusproject/aeolus-performance-testing/blob/master/jmeter/scripts/configure-and-create-users.sh

 - To build and push the 1 image to mock, I used: https://github.com/aeolusproject/aeolus-performance-testing/blob/master/jmeter/build-and-push/fedora15.jmx
 (only one thread needed)

 - To log in the 170 users and deploy the instances concurrently, I used the attached .jmx testplan. Need to place all these jmeter testplans on github but in the mean time, I'm attaching it to the BZ.

Comment 4 Ronelle Landy 2012-03-16 20:20:18 UTC
Created attachment 570697 [details]
jmeter testplan

Comment 5 Tzu-Mainn Chen 2012-03-23 13:54:56 UTC
Patch created:

https://fedorahosted.org/pipermail/aeolus-devel/2012-March/009670.html

I don't claim it fixes everything - a comprehensive review feels out of scope pre-1.1, and would probably involve things like revisiting permissions and caching - but it does fix some pretty big issues.

Comment 6 Tzu-Mainn Chen 2012-03-26 16:03:22 UTC
Patches pushed to master:

commit d951bad80e60b4aaee3db859210f9e2a8601f571
BZ 802571 refactor provider_account sort-by-priority to correctly sort nils, regardless of db

commit 9f2ee9f1cf28be988b02ab3d650d919856fff70d
BZ 802571 don't use deployment.as_json unnecessarily

commit 7b3f91e5d8bdbc2e9db0ec6d051fad9a1f647873
BZ 802571 don't query provider multiple times

commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87
BZ 802571 added eager loading and other minor efficiency fixes

Comment 7 Tzu-Mainn Chen 2012-03-28 21:02:34 UTC
Added fix (https://fedorahosted.org/pipermail/aeolus-devel/2012-March/009742.html), which brings the total number of patches needed to five:

commit d951bad80e60b4aaee3db859210f9e2a8601f571
BZ 802571 refactor provider_account sort-by-priority to correctly sort nils,
regardless of db

commit 9f2ee9f1cf28be988b02ab3d650d919856fff70d
BZ 802571 don't use deployment.as_json unnecessarily

commit 7b3f91e5d8bdbc2e9db0ec6d051fad9a1f647873
BZ 802571 don't query provider multiple times

commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87
BZ 802571 added eager loading and other minor efficiency fixes

commit 1c35c34898731c016e95bc158d2d9ee977f81235
BZ 802571 fix to previous patch so that list_for_user works

Comment 10 Brad P. Crochet 2012-11-02 17:59:49 UTC
Latest run of these tests resulted in just 22 instances of "Service Temporarily Unavailable", all during a call to /conductor/users.

$ grep "Service Temporarily Unavailable" resultsTable200.csv | wc -l
22

This will probably be highly dependent on the system that is hosting Conductor, and how many thin servers are running.

I'm ok with closing this for now, as it is a marked improvement over previous runs of this same test script.

Verified. aeolus-all-0.13.22-1.el6cf.noarch

Comment 12 errata-xmlrpc 2012-12-04 14:58:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-1516.html


Note You need to log in before you can comment on or make changes to this bug.