Bug 993513

Summary:

Baseline calculation fails after upgrading from RHQ4.5.1

Product:

[Other] RHQ Project

Reporter:

m.qaimari

Component:

Core Server

Assignee:

Stefan Negrea <snegrea>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Armine Hovsepyan <ahovsepy>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.8

CC:

genman, hrupp, jsanda, mfoley, stianlund+bugzilla

Target Milestone:

---

Target Release:

RHQ 4.9

Hardware:

x86_64

OS:

Windows

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-03-25 20:58:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
stack-traces found in the server logs	none

Description m.qaimari 2013-08-06 05:56:55 UTC

Created attachment 783202 [details]
stack-traces found in the server logs

Description of problem:

 Baseline calculation fails.

Version-Release number of selected component (if applicable):


How reproducible:

 It happens with no human intervention everyone hour at the time of calculating baseline values.  

Steps to Reproduce:
 N/A

Actual results:
 java.sql.BatchUpdateException: InternalError: Overflow Exception trying to bind Nan.
 (Oracle JDBC Driver used) 

Expected results:
 Baseline values calculated with no problems

Additional info:

 Attached are the stack-traces found in the server logs.

Comment 1 Elias Ross 2013-08-13 19:06:15 UTC

I've seen the same issue from a plain install, not an upgrade from 4.5.1 release.

Comment 2 Stian Lund 2013-08-16 08:34:20 UTC

Seeing the same upgrading from 4.8 to 4.9 snapshot.

Comment 3 Stefan Negrea 2013-08-18 01:08:18 UTC

Fixed the problem by eliminating the need to store baselines with NaN for schedules without data. Going forward only schedules that are enabled and have data will get a baseline entry.

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=3c8110a8abd138ec47eec6cd60d2ad61cad8b56d

Comment 4 John Sanda 2013-08-19 01:34:16 UTC

There were actually a couple problems here. The first and obvious one is that the Oracle JDBC driver does not support Double.NaN. The second problem is that we were storing baselines for which there was no one hour data. The latter deviates from the behavior of the legacy implementation.

I believe that the legacy implementation generated baselines for schedules that are disabled, and I think we should continue to do so unless we have a good reason for doing otherwise. If I disable a schedule within a few hours or minutes before the next baseline calculation, I think it still makes sense to generate the baseline. If a schedule is disabled, we are not collecting metrics for it which means eventually we won't generate baselines for it any way since there won't be any new one hour data.

Comment 5 John Sanda 2013-08-19 02:10:32 UTC

The implementation for calculating baselines is now necessarily different since we store one hour data in Cassandra, and baselines are calculated from that one hour data.

The criteria for deciding which schedules to include in the auto baseline calculations is pretty simple. All those schedules that do not have baselines (and  have a data type of measurement) will includes in the set for which baselines are generated. There are two scenarios in which a schedule will not have a baseline. The first is that one has never been generated before. This occurs the first time we generate one hour data for a schedule. The second scenario in which a baseline is generated is when the time specified by the baseline frequency system setting has elapsed (the default is 3 days).

I have been thinking about this issue for a few days now. I have been confident that we can leverage Cassandra to efficiently and precisely determine the set of schedules for which baselines need to be generated. I also thought it would make sense that this could be done during data aggregation since we are already processing one hour data. I have come up with something that I think can do exactly this as well as make it easier to extend our existing solution to 1) process multiple schedules concurrently and 2) process schedules iteratively.

I will push my work to a branch within the next few days and write up some thoughts on it.