Bug 683921

Summary: Simulate production load for beaker-proxy
Product: [Retired] Beaker Reporter: Bill Peck <bpeck>
Component: testsAssignee: beaker-dev-list
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: medium    
Version: 0.6CC: bpeck, cbouchar, mcsontos, stl, tools-bugs
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ImplementationQuality
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-19 22:15:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 681964    

Description Bill Peck 2011-03-10 16:51:02 UTC
Description of problem:

When verifying that beaker is working as expected before a new release we should have some datasets that represent real world production usage of beaker.  These datasets should be used to verify we haven't regressed performance wise.  They do not need to check for the validity of the data, we have unit tests and integration tests for that already.

Steps to Reproduce:
1. Take a week long data sample from proxy.log on the busiest lab controller
2. Record number of requests per hour
3. Record unique number of calls made
4. Record number of requests per hour per call
5. Look for the busiest hour (number of calls, load during that hour on the system)

The data from step 5 will be the dataset used for a production simulation test.

The test will need to verify that the server and lab controller are working as expected (no unexpected log errors or crashes).

After a few times of running the dataset we should be able to determine acceptable ranges for pass/fail. For a start lets say if the original dataset took an hour to run then the I would expect we should never go beyond 1.5x (ie 1 hour and a half).

This should be automated to run as a beaker test and used for all future maint and minor release criteria.

Comment 1 Raymond Mancy 2011-05-25 13:03:59 UTC
So you (In reply to comment #0)
> Description of problem:
> 
> When verifying that beaker is working as expected before a new release we
> should have some datasets that represent real world production usage of beaker.
>  These datasets should be used to verify we haven't regressed performance wise.
>  They do not need to check for the validity of the data, we have unit tests and
> integration tests for that already.
> 
> Steps to Reproduce:
> 1. Take a week long data sample from proxy.log on the busiest lab controller
> 2. Record number of requests per hour
> 3. Record unique number of calls made
> 4. Record number of requests per hour per call
> 5. Look for the busiest hour (number of calls, load during that hour on the
> system)
> 
> The data from step 5 will be the dataset used for a production simulation test.

Don't we also need to actually determine what these calls are ?

> 
> The test will need to verify that the server and lab controller are working as
> expected (no unexpected log errors or crashes).
> 

I thought this was what the unit tests were for. Or do you mean errors/crashes due to load?

> After a few times of running the dataset we should be able to determine
> acceptable ranges for pass/fail. For a start lets say if the original dataset
> took an hour to run then the I would expect we should never go beyond 1.5x (ie
> 1 hour and a half).
> 

Why not just have a standard load profile that we run, and we can look at the load etc that this puts on the beaker server between code changes, isn't that
a better indicator?

> This should be automated to run as a beaker test and used for all future maint
> and minor release criteria.

Comment 2 Raymond Mancy 2011-05-26 00:29:56 UTC
The proxy is primarily (perhaps exclusively) called by test machines that a recording starts/stop/task_results/watchdogs ?
Perhaps we should aim to create the number of jobs/recipes/recipetasks that will organically do these calls for us with out dummy beah.

I guess this would mean looking at the numbers to get a semi accurate number of jobs being submitted which will produce a semi accurate number of calls through the proxy.

Comment 3 Raymond Mancy 2011-05-26 01:03:42 UTC
(In reply to comment #2)
> The proxy is primarily (perhaps exclusively) called by test machines that a
> recording starts/stop/task_results/watchdogs ?
> Perhaps we should aim to create the number of jobs/recipes/recipetasks that
> will organically do these calls for us with out dummy beah.

Of course I meant 'with _our_ dummy beah'

> 
> I guess this would mean looking at the numbers to get a semi accurate number of
> jobs being submitted which will produce a semi accurate number of calls through
> the proxy.

Comment 4 Raymond Mancy 2011-05-26 12:35:04 UTC
Also I'm not sure how we can accurately gauge what XMLRPC methods are coming in from clients. They all hit the same /client URL and the details of the actual function are hidden.

Comment 5 Raymond Mancy 2012-08-31 05:23:11 UTC
I don't think it should be beaker-proxy that we should be trying to directly imitate, most of the calls to beaker-proxy are secondary incidents. i.e tasks starting etc.

I think it's better to take a top down approach, where we replicate the number of jobs/recipes/tasks submitted, the users actions via the UI etc. Then the number of lower level calls being made should just take care of themselves.

Up until I have just been going through logs and querying databases to get these numbers. However I'm hoping that an expansion of internally generated statistics (i.e to graphite) will help here.

Comment 6 Raymond Mancy 2012-09-26 00:25:40 UTC
We need a better way of doing this. Ideally I think we should be taking metrics from the production server.

Comment 9 Dan Callaghan 2015-03-30 07:18:52 UTC
Back in October 2013, Nick said on an internal ticket (and I agree):

https://bugzilla.redhat.com/show_bug.cgi?id=1014875 is the latest example of a Beaker race condition we didn't hit until 0.15 was deployed and the lab controller had to deal with hundreds of systems running tasks in parallel.

We need to figure out a way to set up a test scenario that hammers the server with some serious load. For example, a power script that starts a gevent worker to hit the server with fake input and a driver to create the appropriate fake systems, submit the appropriate jobs and check the results.

Set it all up to run inside Beaker in a way similar to the existing dogfood task.