Bug 1063430 - OOB calculation can take longer than an hour; either optimize it or have a way to disable it
Summary: OOB calculation can take longer than an hour; either optimize it or have a wa...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Performance
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: DR03
: JON 3.2.2
Assignee: Lukas Krejci
QA Contact: Garik Khachikyan
URL:
Whiteboard:
Depends On: 1059412
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-10 17:36 UTC by Larry O'Leary
Modified: 2015-01-04 22:00 UTC (History)
5 users (show)

Fixed In Version:
Clone Of: 1059412
Environment:
Last Closed: 2014-07-29 00:17:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Larry O'Leary 2014-02-10 17:36:00 UTC
Candidate for 3.2.x. Without this fix OOB calculations should be considered broken for all practical purposes. 

+++ This bug was initially created as a clone of Bug #1059412 +++

Description of problem:

See: https://community.jboss.org/message/855449#855449

"17:49:03,425 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-4) Finished calculating 32 OOBs in 3086252 ms  
17:49:03,425 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Auto-calculation of OOBs completed in [3086260]ms  

(This is roughly 51 minutes, not good.)

I'm not sure why OOB calculations would take this long. I'm digging into why but if somebody knows what to look for here, that'd be awesome.
 
I'm thinking it'd be nice to turn off OOB stuff as I don't use it anyway, and it looks like it's making my purge job take more than an hour anyway.
 
Seems to me it's just the number of metrics is too much, that querying Oracle for each one takes more time than it can support. It seems the answer might be to:
1) Have an option to disable OOB completely
2) Only do a partial calculation (like 30%)
3) Optimize the round trip to the database, e.g. query all (or a portion) baselines at once, then merge the results.
"

Version-Release number of selected component (if applicable): 4.9


How reproducible: Always, with enough schedules


Additional info:

Should be possible to optimize this. Or simply disable OOB calculations. This impacts data purge.

--- Additional comment from Heiko W. Rupp on 2014-02-04 07:44:50 EST ---

It is possible to disable automatic baseline calculation by setting the frequency to 0. 
So here we could check that if baseline calculation is disabled, also no baselines are calculated, as they would need the baseline high/low as reference points.

It is still possible to have baselines though (e.g. via REST-api) so we can't just completely switch off OOB computation if baselineFrequency == 0.

One option could be to check in case baselineFrequency==0 for Baselines in the Baseline table and then only feed those metrics into 
org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean#calculateOOB instead of all metrics (basically filtering the list passed in into org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean#computeOOBsForLastHour )

Also it should be possible from #computeOOBsForLastHour to just get all existing baselines in one DB-roundtrip and then passing the relevant value into #calculateOOB instead of having all the EM calls inside #calculateOOB 
(and the query #calculateOOB should be a NamedQuery so that it can be prepared in advance)

--- Additional comment from Heiko W. Rupp on 2014-02-05 02:33:03 EST ---

The attached patch reduces the run time for me from 

17:06:50,501 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-3) Finished calculating 1223 OOBs in 116039 ms

to

05:04:39,678 INFO  [org.rhq.enterprise.server.measurement.MeasurementOOBManagerBean] (RHQScheduler_Worker-4) Finished calculating 1213 OOBs in 16359 ms

The patch probably needs a little more work with respect to the signature change detector.

--- Additional comment from John Sanda on 2014-02-10 11:37:50 EST ---

The patch has been tested and applied to master.

master commit hash: 3b9bbfd4355

Comment 1 Lukas Krejci 2014-06-21 06:43:51 UTC
commit 6eb9c50486316610f7874040f97c64d8ba8cd660
Author: John Sanda <jsanda>
Date:   Sun Feb 9 20:00:20 2014 -0500

    [BZ 1059412] Applying patch to speed up OOB calculations
    
    (cherry picked from commit 3b9bbfd4355f466f128074f603d0dab48d1766d5)
    Signed-off-by: Lukas Krejci <lkrejci>

Comment 2 Heiko W. Rupp 2014-06-23 18:45:08 UTC
( That got cherry-picked on 06-21 and is thus in dr3)

Comment 3 Simeon Pinder 2014-06-30 06:03:11 UTC
Moving to ON_QA as available for test in latest build:
http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.2.GA/6-28-2014/

Comment 4 Garik Khachikyan 2014-07-03 14:40:05 UTC
# COMMENT

taking for verification.

Comment 5 Garik Khachikyan 2014-07-03 14:50:01 UTC
# COMMENT

scenario to perform:

1. setup a JON 3.2.0 GA server with several agents (to fill up statistics fast)
2. wait for some period of time to make some data available
3. grep logs to collect: `cat ~/jon-server-3.2.0.GA/logs/server.log | grep "Auto-calculation of OOBs completed in "`
4. upgrade to JON 3.2.2 DR3
5. wait some more time
6. grep again 

compare performance.

Comment 6 Garik Khachikyan 2014-07-07 13:44:18 UTC
# VERIFIED

JON 3.2.2 DR4 has the improvement: it decreased for me (with 5 agents) from ~15 sec. to 4 sec time.

===
00:00:43,971 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-3) Auto-calculation of OOBs completed in [13867]ms
01:00:26,189 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-2) Auto-calculation of OOBs completed in [14221]ms
02:00:25,705 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-5) Auto-calculation of OOBs completed in [14256]ms
03:00:27,403 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-4) Auto-calculation of OOBs completed in [15425]ms
04:00:31,674 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-5) Auto-calculation of OOBs completed in [6701]ms
05:00:20,577 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-3) Auto-calculation of OOBs completed in [6454]ms
06:00:41,183 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-2) Auto-calculation of OOBs completed in [5072]ms
07:00:21,776 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-1) Auto-calculation of OOBs completed in [4025]ms
08:00:44,999 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-5) Auto-calculation of OOBs completed in [2902]ms
09:00:17,677 INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-2) Auto-calculation of OOBs completed in [3417]ms
===

Comment 7 Larry O'Leary 2014-07-29 00:17:33 UTC
This has been verified and released in Red Hat JBoss Operations Network 3.2 Update 02 (3.2.2) available from the Red Hat Customer Portal[1].



[1]: https://access.redhat.com/jbossnetwork/restricted/softwareDetail.html?softwareId=31783


Note You need to log in before you can comment on or make changes to this bug.