Bug 993513

Summary: Baseline calculation fails after upgrading from RHQ4.5.1
Product: [Other] RHQ Project Reporter: m.qaimari
Component: Core ServerAssignee: Stefan Negrea <snegrea>
Status: CLOSED CURRENTRELEASE QA Contact: Armine Hovsepyan <ahovsepy>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.8CC: genman, hrupp, jsanda, mfoley, stianlund+bugzilla
Target Milestone: ---   
Target Release: RHQ 4.9   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-25 20:58:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
stack-traces found in the server logs none

Description m.qaimari 2013-08-06 05:56:55 UTC
Created attachment 783202 [details]
stack-traces found in the server logs

Description of problem:

 Baseline calculation fails.

Version-Release number of selected component (if applicable):


How reproducible:

 It happens with no human intervention everyone hour at the time of calculating baseline values.  

Steps to Reproduce:
 N/A

Actual results:
 java.sql.BatchUpdateException: InternalError: Overflow Exception trying to bind Nan.
 (Oracle JDBC Driver used) 

Expected results:
 Baseline values calculated with no problems

Additional info:

 Attached are the stack-traces found in the server logs.

Comment 1 Elias Ross 2013-08-13 19:06:15 UTC
I've seen the same issue from a plain install, not an upgrade from 4.5.1 release.

Comment 2 Stian Lund 2013-08-16 08:34:20 UTC
Seeing the same upgrading from 4.8 to 4.9 snapshot.

Comment 3 Stefan Negrea 2013-08-18 01:08:18 UTC
Fixed the problem by eliminating the need to store baselines with NaN for schedules without data. Going forward only schedules that are enabled and have data will get a baseline entry.

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=3c8110a8abd138ec47eec6cd60d2ad61cad8b56d

Comment 4 John Sanda 2013-08-19 01:34:16 UTC
There were actually a couple problems here. The first and obvious one is that the Oracle JDBC driver does not support Double.NaN. The second problem is that we were storing baselines for which there was no one hour data. The latter deviates from the behavior of the legacy implementation.

I believe that the legacy implementation generated baselines for schedules that are disabled, and I think we should continue to do so unless we have a good reason for doing otherwise. If I disable a schedule within a few hours or minutes before the next baseline calculation, I think it still makes sense to generate the baseline. If a schedule is disabled, we are not collecting metrics for it which means eventually we won't generate baselines for it any way since there won't be any new one hour data.

Comment 5 John Sanda 2013-08-19 02:10:32 UTC
The implementation for calculating baselines is now necessarily different since we store one hour data in Cassandra, and baselines are calculated from that one hour data.

The criteria for deciding which schedules to include in the auto baseline calculations is pretty simple. All those schedules that do not have baselines (and  have a data type of measurement) will includes in the set for which baselines are generated. There are two scenarios in which a schedule will not have a baseline. The first is that one has never been generated before. This occurs the first time we generate one hour data for a schedule. The second scenario in which a baseline is generated is when the time specified by the baseline frequency system setting has elapsed (the default is 3 days).

I have been thinking about this issue for a few days now. I have been confident that we can leverage Cassandra to efficiently and precisely determine the set of schedules for which baselines need to be generated. I also thought it would make sense that this could be done during data aggregation since we are already processing one hour data. I have come up with something that I think can do exactly this as well as make it easier to extend our existing solution to 1) process multiple schedules concurrently and 2) process schedules iteratively.

I will push my work to a branch within the next few days and write up some thoughts on it.