Hide Forgot
Description of problem: When verifying that beaker is working as expected before a new release we should have some datasets that represent real world production usage of beaker. These datasets should be used to verify we haven't regressed performance wise. They do not need to check for the validity of the data, we have unit tests and integration tests for that already. Steps to Reproduce: 1. Take a week long data sample from beaker production of the number of running jobs at one time. 2. We need to record the number of recipeSets, recipes and even tasks of those running jobs. 3. Look for the busiest hour (number of jobs running) The data from step 3 will be the dataset used for a production simulation test. The test will need to verify that the server and client are working as expected (no unexpected log errors or crashes). Spawn a bkr job-watch for every active job. To make this accurate we will need to use the dataset from beaker-proxy or at a minimum do status updates and result updates that closely resemble the production dataset. After a few times of running the dataset we should be able to determine acceptable ranges for pass/fail. For a start lets say if the original dataset took an hour to run then the I would expect we should never go beyond 1.5x (ie 1 hour and a half). This should be automated to run as a beaker test and used for all future maint and minor release criteria.
I'd imagine this will need to be done on dedicated hardware otherwise the expected run time could vary wildly?
True, unless you run both old and new every time..
Old would then have to be the bkr code at the point in time when we originally design the tests and it wouldn't be able to change. e.g if 0.7 is version when the tests are developed, we would compare against it when testing 0.7.1. When testing for 0.7.2 we would still need to compare against 0.7 as 0.7.1 may not at all be the same test time as 0.7. Even then, what may be a 10% increase on one box may be a 15% increase on another.
Even better would be a system that is as close to production as possible....
(In reply to comment #4) > Even better would be a system that is as close to production as possible.... 1,587 real systems in production beaker.
Sorry, I meant he hardware that beaker server runs on.
That makes sense. This will be more complicated then since production are xen guests. I would expect that we should look for a physical system that is close to what we use in production and then stick to it like you say. But we will either need to pre-setup these virt guests or build them from our workflow.
Yes perhaps we should have an SQL file that we always import before running tests, so each time we're working on the same data set. When you say 'pre-setup these virt guests', you're referring to the xen guests that you mentioned? (and not test machines)
(In reply to comment #8) > Yes perhaps we should have an SQL file that we always import before running > tests, so each time we're working on the same data set. Good idea. > > When you say 'pre-setup these virt guests', you're referring to the xen guests > that you mentioned? (and not test machines) Correct. Think of these systems in beaker-stage: hyperv-guest01 hyperv-guest02 They are virtual machines, but we don't have to set them up for each test.
I think they will need to be in the DB for each 'test'. What I'm envisioning is this. We have a beaker-server instance running with a set of known data. The data would have to be sufficiently large to somewhat replicate production (it would also be re-loaded before each test run). We then have a program called bkr-load. We run it, it will import a series of modules, each module containing classes which perform a certain activity (i.e threading a bunch of job-watch commands, or hitting the WebUI etc etc). Each of these classes are responsible for recording their own timing and the server will be recording the load. How many of these 'scenarios' (dcallagh came up this and I like it) you would be running would be configurable, so you could test the load when you're just running a single scenario (i.e UI), or perhaps with as many scenarios as exist (i.e job-watch, job-creation, UI, proxy-update-jobs etc etc). Some scenario's will rely on others to do something meaningful, i.e job-watch will rely on proxy-update-jobs because it needs it to do something useful. This is my initial thoughts on the direction I'd like to take it. I like it because it's highly configurable, and given the time and inclination we could extend to the point where it would be close to simulating production.
kbakers etherpad outlined the need to create a load test for each feature, so I've included it in the time on that bug. Although I'm not really sure how to get the Job data. The time I've put in is just for getting hta job data.
Done