993513 – Baseline calculation fails after upgrading from RHQ4.5.1

Bug 993513 - Baseline calculation fails after upgrading from RHQ4.5.1

Summary: Baseline calculation fails after upgrading from RHQ4.5.1

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Core Server
Sub Component:
Version:	4.8
Hardware:	x86_64
OS:	Windows
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHQ 4.9
Assignee:	Stefan Negrea
QA Contact:	Armine Hovsepyan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-06 05:56 UTC by m.qaimari
Modified:	2015-09-03 00:01 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-03-25 20:58:49 UTC
Embargoed:

Attachments	(Terms of Use)
stack-traces found in the server logs (156.27 KB, text/plain) 2013-08-06 05:56 UTC, m.qaimari	no flags	Details
View All

Description m.qaimari 2013-08-06 05:56:55 UTC

Created attachment 783202 [details]
stack-traces found in the server logs

Description of problem:

 Baseline calculation fails.

Version-Release number of selected component (if applicable):


How reproducible:

 It happens with no human intervention everyone hour at the time of calculating baseline values.  

Steps to Reproduce:
 N/A

Actual results:
 java.sql.BatchUpdateException: InternalError: Overflow Exception trying to bind Nan.
 (Oracle JDBC Driver used) 

Expected results:
 Baseline values calculated with no problems

Additional info:

 Attached are the stack-traces found in the server logs.

Comment 1 Elias Ross 2013-08-13 19:06:15 UTC

I've seen the same issue from a plain install, not an upgrade from 4.5.1 release.

Comment 2 Stian Lund 2013-08-16 08:34:20 UTC

Seeing the same upgrading from 4.8 to 4.9 snapshot.

Comment 3 Stefan Negrea 2013-08-18 01:08:18 UTC

Fixed the problem by eliminating the need to store baselines with NaN for schedules without data. Going forward only schedules that are enabled and have data will get a baseline entry.

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=3c8110a8abd138ec47eec6cd60d2ad61cad8b56d

Comment 4 John Sanda 2013-08-19 01:34:16 UTC

There were actually a couple problems here. The first and obvious one is that the Oracle JDBC driver does not support Double.NaN. The second problem is that we were storing baselines for which there was no one hour data. The latter deviates from the behavior of the legacy implementation.

I believe that the legacy implementation generated baselines for schedules that are disabled, and I think we should continue to do so unless we have a good reason for doing otherwise. If I disable a schedule within a few hours or minutes before the next baseline calculation, I think it still makes sense to generate the baseline. If a schedule is disabled, we are not collecting metrics for it which means eventually we won't generate baselines for it any way since there won't be any new one hour data.

Comment 5 John Sanda 2013-08-19 02:10:32 UTC

The implementation for calculating baselines is now necessarily different since we store one hour data in Cassandra, and baselines are calculated from that one hour data.

The criteria for deciding which schedules to include in the auto baseline calculations is pretty simple. All those schedules that do not have baselines (and  have a data type of measurement) will includes in the set for which baselines are generated. There are two scenarios in which a schedule will not have a baseline. The first is that one has never been generated before. This occurs the first time we generate one hour data for a schedule. The second scenario in which a baseline is generated is when the time specified by the baseline frequency system setting has elapsed (the default is 3 days).

I have been thinking about this issue for a few days now. I have been confident that we can leverage Cassandra to efficiently and precisely determine the set of schedules for which baselines need to be generated. I also thought it would make sense that this could be done during data aggregation since we are already processing one hour data. I have come up with something that I think can do exactly this as well as make it easier to extend our existing solution to 1) process multiple schedules concurrently and 2) process schedules iteratively.

I will push my work to a branch within the next few days and write up some thoughts on it.

Note You need to log in before you can comment on or make changes to this bug.