Bug 1059412 - OOB calculation can take longer than an hour; either optimize it or have a way to disable it
Summary: OOB calculation can take longer than an hour; either optimize it or have a wa...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Performance
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHQ 4.10
Assignee: John Sanda
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1063430
TreeView+ depends on / blocked
 
Reported: 2014-01-29 19:56 UTC by Elias Ross
Modified: 2014-04-23 12:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1063430 (view as bug list)
Environment:
Last Closed: 2014-04-23 12:31:28 UTC
Embargoed:


Attachments (Terms of Use)
Possible patch (7.44 KB, patch)
2014-02-05 07:33 UTC, Heiko W. Rupp
no flags Details | Diff

Description Elias Ross 2014-01-29 19:56:58 UTC
Description of problem:

See: https://community.jboss.org/message/855449#855449

"17:49:03,425 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-4) Finished calculating 32 OOBs in 3086252 ms  
17:49:03,425 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Auto-calculation of OOBs completed in [3086260]ms  

(This is roughly 51 minutes, not good.)

I'm not sure why OOB calculations would take this long. I'm digging into why but if somebody knows what to look for here, that'd be awesome.
 
I'm thinking it'd be nice to turn off OOB stuff as I don't use it anyway, and it looks like it's making my purge job take more than an hour anyway.
 
Seems to me it's just the number of metrics is too much, that querying Oracle for each one takes more time than it can support. It seems the answer might be to:
1) Have an option to disable OOB completely
2) Only do a partial calculation (like 30%)
3) Optimize the round trip to the database, e.g. query all (or a portion) baselines at once, then merge the results.
"

Version-Release number of selected component (if applicable): 4.9


How reproducible: Always, with enough schedules


Additional info:

Should be possible to optimize this. Or simply disable OOB calculations. This impacts data purge.

Comment 1 Heiko W. Rupp 2014-02-04 12:44:50 UTC
It is possible to disable automatic baseline calculation by setting the frequency to 0. 
So here we could check that if baseline calculation is disabled, also no baselines are calculated, as they would need the baseline high/low as reference points.

It is still possible to have baselines though (e.g. via REST-api) so we can't just completely switch off OOB computation if baselineFrequency == 0.

One option could be to check in case baselineFrequency==0 for Baselines in the Baseline table and then only feed those metrics into 
org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean#calculateOOB instead of all metrics (basically filtering the list passed in into org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean#computeOOBsForLastHour )

Also it should be possible from #computeOOBsForLastHour to just get all existing baselines in one DB-roundtrip and then passing the relevant value into #calculateOOB instead of having all the EM calls inside #calculateOOB 
(and the query #calculateOOB should be a NamedQuery so that it can be prepared in advance)

Comment 2 Heiko W. Rupp 2014-02-05 07:33:03 UTC
Created attachment 859504 [details]
Possible patch

The attached patch reduces the run time for me from 

17:06:50,501 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-3) Finished calculating 1223 OOBs in 116039 ms

to

05:04:39,678 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-4) Finished calculating 1213 OOBs in 16359 ms

The patch probably needs a little more work with respect to the signature change detector.

Comment 3 John Sanda 2014-02-10 16:37:50 UTC
The patch has been tested and applied to master.

master commit hash: 3b9bbfd4355

Comment 4 Heiko W. Rupp 2014-04-23 12:31:28 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.


Note You need to log in before you can comment on or make changes to this bug.