Red Hat Bugzilla – Bug 993513
Baseline calculation fails after upgrading from RHQ4.5.1
Last modified: 2015-09-02 20:01:51 EDT
Created attachment 783202 [details]
stack-traces found in the server logs
Description of problem:
Baseline calculation fails.
Version-Release number of selected component (if applicable):
It happens with no human intervention everyone hour at the time of calculating baseline values.
Steps to Reproduce:
java.sql.BatchUpdateException: InternalError: Overflow Exception trying to bind Nan.
(Oracle JDBC Driver used)
Baseline values calculated with no problems
Attached are the stack-traces found in the server logs.
I've seen the same issue from a plain install, not an upgrade from 4.5.1 release.
Seeing the same upgrading from 4.8 to 4.9 snapshot.
Fixed the problem by eliminating the need to store baselines with NaN for schedules without data. Going forward only schedules that are enabled and have data will get a baseline entry.
There were actually a couple problems here. The first and obvious one is that the Oracle JDBC driver does not support Double.NaN. The second problem is that we were storing baselines for which there was no one hour data. The latter deviates from the behavior of the legacy implementation.
I believe that the legacy implementation generated baselines for schedules that are disabled, and I think we should continue to do so unless we have a good reason for doing otherwise. If I disable a schedule within a few hours or minutes before the next baseline calculation, I think it still makes sense to generate the baseline. If a schedule is disabled, we are not collecting metrics for it which means eventually we won't generate baselines for it any way since there won't be any new one hour data.
The implementation for calculating baselines is now necessarily different since we store one hour data in Cassandra, and baselines are calculated from that one hour data.
The criteria for deciding which schedules to include in the auto baseline calculations is pretty simple. All those schedules that do not have baselines (and have a data type of measurement) will includes in the set for which baselines are generated. There are two scenarios in which a schedule will not have a baseline. The first is that one has never been generated before. This occurs the first time we generate one hour data for a schedule. The second scenario in which a baseline is generated is when the time specified by the baseline frequency system setting has elapsed (the default is 3 days).
I have been thinking about this issue for a few days now. I have been confident that we can leverage Cassandra to efficiently and precisely determine the set of schedules for which baselines need to be generated. I also thought it would make sense that this could be done during data aggregation since we are already processing one hour data. I have come up with something that I think can do exactly this as well as make it easier to extend our existing solution to 1) process multiple schedules concurrently and 2) process schedules iteratively.
I will push my work to a branch within the next few days and write up some thoughts on it.