Bug 1011114 - Baseline calculations is slow
Summary: Baseline calculations is slow
Keywords:
Status: CLOSED DUPLICATE of bug 1011107
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ER03
: RHQ 4.10
Assignee: RHQ Project Maintainer
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1011084 951619
TreeView+ depends on / blocked
 
Reported: 2013-09-23 16:02 UTC by John Sanda
Modified: 2013-09-23 19:05 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-09-23 19:05:09 UTC
Embargoed:


Attachments (Terms of Use)

Description John Sanda 2013-09-23 16:02:30 UTC
Description of problem:
Baseline calculations can take a long time when there are a large number of schedules that need baselines. Here are some stats from a 4.10-SNAPSHOT environment:

18:07:29,675 INFO  [org.rhq.enterprise.server.measurement.MeasurementBaselineManagerBean] (RHQScheduler_Worker-4) Calculated and inserted [41831] new baselines. (1716527)ms
18:07:29,688 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Auto-calculation of baselines completed in [1716690]ms
18:07:29,688 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Auto-calculation of OOBs starting
18:07:29,761 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-4) Removed [21772] outdated OOBs
18:07:29,905 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-4) Computing OOBs
18:07:47,902 INFO  [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] (EJB default - 3) jsanda-dev03.bc.jonqe.lab.eng.bos.redhat.com took [283]ms to reload cache for 2 agents
18:08:55,892 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-4) Finished calculating 82 OOBs in 85987 ms
18:08:55,892 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Auto-calculation of OOBs completed in [86204]ms
18:08:55,892 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Data Purge Job FINISHED [6064715]ms


From the about server.log output, we can see that baseline calculations took 28.6 minutes. This is the same environment that was used for bug 1009945, so it is not an overly large environment. Similar to the aggregation, the issue is straightforward. The calculations for each schedule are done serially. Calculating baselines for multiple schedules concurrently should yield a dramatic improvement.

I think we can do even better though than simply calculating multiple baselines concurrently. We can in effect create a pipeline for the calculations that need to be done. Once the one hour data for a schedule is calculated, we can go ahead and generate the baseline, and then we can do the OOBs. Right now, we first all the compression, then we do all the baselines, and then we do the OOBs.

raw data --> 1hr data --> 6 hr data --> 24 hr data
             \
              \
               --> baseline (if necessary) --> OOBs


The above diagram shows what the pipeline would look like. For a given schedule, once we calculate the one hr data, we can go ahead and calculate the baselines (if necessar) and then do the OOB calculations. We can generate the 6 hr and 24 data in parallel to the baseline and OOB calculations. Right now we see a big memory spike during the data purge job because the generated 1 hr data is is passed to MeasurementOOBManagerLocal.computeOOBsForLastHour. This could be a sizable amount of memory depending on the number of raw metrics that are being aggregated. The pipeline is a more iterative approach where we could keep the increase in memory usage fixed regardless of the number of scheduled being aggregated.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Heiko W. Rupp 2013-09-23 19:05:09 UTC

*** This bug has been marked as a duplicate of bug 1011107 ***


Note You need to log in before you can comment on or make changes to this bug.