Bug 802571 - Scalability Testing: 150 + users logged into, and deploying instances from, one conductor instance results in "Response code: 503" - Response message: Service Temporarily Unavailable
Scalability Testing: 150 + users logged into, and deploying instances from, ...
Status: CLOSED ERRATA
Product: CloudForms Cloud Engine
Classification: Red Hat
Component: aeolus-conductor (Show other bugs)
1.0.0
Unspecified Linux
unspecified Severity medium
: rc
: ---
Assigned To: Tzu-Mainn Chen
wes hayutin
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-12 18:37 EDT by Ronelle Landy
Modified: 2012-12-04 09:58 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
During a scale-testing of 150+ users logging in Conductor and deploying instances, some instances returned a response message: "Service Temporarily Unavailable." This bug fix reduces query numbers and adds eager loading so fewer instances will return this error.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 09:58:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test Results Table (238.72 KB, text/csv)
2012-03-12 18:40 EDT, Ronelle Landy
no flags Details
jmeter testplan (102.58 KB, application/octet-stream)
2012-03-16 16:20 EDT, Ronelle Landy
no flags Details

  None (edit)
Description Ronelle Landy 2012-03-12 18:37:08 EDT
Description of problem:

150 + users logged into, and deploying instances from, one conductor instance results in large delays in accessing pages (up to 7 or 8 minutes) and "Response message: Service Temporarily Unavailable" messages.

Is there a documented limit to how many users can/should be logged into one conductor instance concurrently?


Steps to Reproduce:
1. Add 200 users to the aeolus database
2. Create a mock provider account in conductor
3. Build and push one image to the mock provider
3. (Using jmeter) 170 users concurrently log in, access the image, create a
blueprint and launch that image (with hardware profile) to the mock provider
4. See that some instances (99) launch successfully 
5. Others instances fail to launch
6. All through the test there are "Response message: Service Temporarily Unavailable" messages when users are accessing:conductor/pools, conductor/deployments, conductor/deployables, conductor/images and conductor/users pages. The number of these messages varies but increases from 25 "Service Temporarily Unavailable" messages in conductor/users to over 40 in conductor/pools at the end of the test.

See attachment with test results output.


Versions tested:

rpm -qa |grep aeolus
rubygem-aeolus-cli-0.3.0-14.el6.noarch
aeolus-configure-2.5.0-18.el6.noarch
aeolus-all-0.8.0-41.el6.noarch
rubygem-aeolus-image-0.3.0-12.el6.noarch
aeolus-conductor-0.8.0-41.el6.noarch
aeolus-conductor-daemons-0.8.0-41.el6.noarch
aeolus-conductor-doc-0.8.0-41.el6.noarch
Comment 1 Ronelle Landy 2012-03-12 18:40:17 EDT
Created attachment 569514 [details]
Test Results Table
Comment 2 Tzu-Mainn Chen 2012-03-16 15:20:10 EDT
Just to be sure, can you tell me which jmeter test you used?  Was it aeolus-performance-testing/jmeter/create-deployment-and-launch?
Comment 3 Ronelle Landy 2012-03-16 16:16:50 EDT
I used a mix of tests and scripts:

 - To add the 200 users, I used https://github.com/aeolusproject/aeolus-performance-testing/blob/master/jmeter/scripts/configure-and-create-users.sh

 - To build and push the 1 image to mock, I used: https://github.com/aeolusproject/aeolus-performance-testing/blob/master/jmeter/build-and-push/fedora15.jmx
 (only one thread needed)

 - To log in the 170 users and deploy the instances concurrently, I used the attached .jmx testplan. Need to place all these jmeter testplans on github but in the mean time, I'm attaching it to the BZ.
Comment 4 Ronelle Landy 2012-03-16 16:20:18 EDT
Created attachment 570697 [details]
jmeter testplan
Comment 5 Tzu-Mainn Chen 2012-03-23 09:54:56 EDT
Patch created:

https://fedorahosted.org/pipermail/aeolus-devel/2012-March/009670.html

I don't claim it fixes everything - a comprehensive review feels out of scope pre-1.1, and would probably involve things like revisiting permissions and caching - but it does fix some pretty big issues.
Comment 6 Tzu-Mainn Chen 2012-03-26 12:03:22 EDT
Patches pushed to master:

commit d951bad80e60b4aaee3db859210f9e2a8601f571
BZ 802571 refactor provider_account sort-by-priority to correctly sort nils, regardless of db

commit 9f2ee9f1cf28be988b02ab3d650d919856fff70d
BZ 802571 don't use deployment.as_json unnecessarily

commit 7b3f91e5d8bdbc2e9db0ec6d051fad9a1f647873
BZ 802571 don't query provider multiple times

commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87
BZ 802571 added eager loading and other minor efficiency fixes
Comment 7 Tzu-Mainn Chen 2012-03-28 17:02:34 EDT
Added fix (https://fedorahosted.org/pipermail/aeolus-devel/2012-March/009742.html), which brings the total number of patches needed to five:

commit d951bad80e60b4aaee3db859210f9e2a8601f571
BZ 802571 refactor provider_account sort-by-priority to correctly sort nils,
regardless of db

commit 9f2ee9f1cf28be988b02ab3d650d919856fff70d
BZ 802571 don't use deployment.as_json unnecessarily

commit 7b3f91e5d8bdbc2e9db0ec6d051fad9a1f647873
BZ 802571 don't query provider multiple times

commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87
BZ 802571 added eager loading and other minor efficiency fixes

commit 1c35c34898731c016e95bc158d2d9ee977f81235
BZ 802571 fix to previous patch so that list_for_user works
Comment 10 Brad P. Crochet 2012-11-02 13:59:49 EDT
Latest run of these tests resulted in just 22 instances of "Service Temporarily Unavailable", all during a call to /conductor/users.

$ grep "Service Temporarily Unavailable" resultsTable200.csv | wc -l
22

This will probably be highly dependent on the system that is hosting Conductor, and how many thin servers are running.

I'm ok with closing this for now, as it is a marked improvement over previous runs of this same test script.

Verified. aeolus-all-0.13.22-1.el6cf.noarch
Comment 12 errata-xmlrpc 2012-12-04 09:58:35 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-1516.html

Note You need to log in before you can comment on or make changes to this bug.