By waiting 3 days between baseline calculations and then recomputing any that are stale, the amount of work that needs to be done might not complete in time. as a result, we were seeing baselines timeout. charles: "I was seeing call to this method MeasurementBaselineManagerBean._calculateAutoBaselinesDELETE(long startTime, long endTime) throws Exception { timeout as part of the DataPurge job in the perf job. This method along with _calculateAutoBaselinesINSERT, _calculateAutoBaselinesDELETE_HQL and _calculateAutoBaselinesINSERT_HQL are using the default txn timeout (10mins?). They have a longer timeout commented out //@TransactionTimeout( 60 * 60 )" this jira is to fix the baseline calculation job algo so that as soon as a baseline is stale it is purged and recalculated. this will help amortize the performance profile by doing the work in as frequent and small chunks as possible.
btw, _calculateAutoBaselinesDELETE_HQL and _calculateAutoBaselinesINSERT_HQL are not currently in use. we wanted this to be a purely JPQL/HQL driven solution, but multiple attempts have been tried only to fall up short to due the inability for the syntax to support what we needed to do.
rev3179 - change baseline calculation algorithm to recompute on a fine-grained basis, as soon as a baseline becomes "stale"
reproduction steps: step 1: reduce time necessary to see results * go to server configuration under the administration section of the site * change BOTH the baseline frequency and baseline dataset to 1 day step 2: skew the import times of resources, this will skew the baseline calculation times * start with an empty inventory @ time "t0" * import SOME but not all resources from AD portlet, let's call the number of resources in inventory at this time A * wait at least X hours, where X >= 2, let's call this "t1" *import the rest of the resources, let's call the number of resources imported by this B * thus, A+B is the total number of resources in inventory, let's call this C step 3: verify results * 24 hours after t0 (or 24-X hours after t1), let's call this "t2", the first baselines will have been calculated ** go to /admin/test/sql.jsp and execute "select count(id) from rhq_measurement_bline" ** the count should be equal to A * 24 hours after t1 (or after X more hours "t2"), the rest of the baselines will have been calculated ** go to /admin/test/sql.jsp and execute "select count(id) from rhq_measurement_bline" ** the count should be equal to C ** now execute "select bl_compute_time, count(id) from rhq_measurement_bline group by bl_compute_time" ** you should have two results, one count should be equal to A, one count should be equal to B
QA Verified (finally).
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1661