Bug 1028173 - Baseline calculation attempts to bind NaN, Overflow Exception for Oracle
Baseline calculation attempts to bind NaN, Overflow Exception for Oracle
Status: CLOSED CURRENTRELEASE
Product: JBoss Operations Network
Classification: JBoss
Component: Core Server (Show other bugs)
JON 3.2
Unspecified Unspecified
unspecified Severity unspecified
: ER07
: JON 3.2.0
Assigned To: Stefan Negrea
Mike Foley
:
Depends On: 1017474
Blocks: 1012435
  Show dependency treegraph
 
Reported: 2013-11-07 15:26 EST by Stefan Negrea
Modified: 2014-01-02 15:43 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1017474
Environment:
Last Closed: 2014-01-02 15:43:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Stefan Negrea 2013-11-07 15:26:25 EST
+++ This bug was initially created as a clone of Bug #1017474 +++

See Bug 993513

This is RHQ 4.9, however.


01:05:17,366 ERROR [org.jboss.as.ejb3.invocation] (RHQScheduler_Worker-4) JBAS014134: EJB Invocation failed on component MeasurementBaselineManagerBean for method public abstract void org.rhq.enterprise.server.measurement.MeasurementBaselineManagerLocal.calculateAutoBaselines(): javax.ejb.EJBException: java.lang.RuntimeException: Auto-calculation failure
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
	at org.jboss.as.ejb3.tx.CMTTxInterceptor.never(CMTTxInterceptor.java:285) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]

...
	at org.rhq.enterprise.server.measurement.MeasurementBaselineManagerLocal$$$view12.saveNewBaselines(Unknown Source) [rhq-server.jar:4.9.0]
	at org.rhq.enterprise.server.measurement.MeasurementBaselineManagerBean.calculateBaselines(MeasurementBaselineManagerBean.java:280) [rhq-server.jar:4.9.0]

...

Caused by: java.sql.BatchUpdateException: Internal Error: Overflow Exception trying to bind NaN
	at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10296) [ojdbc6.jar:11.2.0.2.0]
	at oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:216) [ojdbc6.jar:11.2.0.2.0]
....

--- Additional comment from Elias Ross on 2013-11-07 10:00:33 EST ---

diff --git a/modules/enterprise/server/server-metrics/src/main/java/org/rhq/server/metrics/MetricsBaselineCalculator.java b/modules/enterprise/server/server-metrics/src/main/java
index ef7d092..f132135 100644
--- a/modules/enterprise/server/server-metrics/src/main/java/org/rhq/server/metrics/MetricsBaselineCalculator.java
+++ b/modules/enterprise/server/server-metrics/src/main/java/org/rhq/server/metrics/MetricsBaselineCalculator.java
@@ -98,6 +98,9 @@ private MeasurementBaseline calculateBaseline(Integer schedule, long startTime,
                     }
                 }
             }
+            if (Double.isNaN(max) || Double.isNaN(min)) {
+                return null; // do not record this
+            }
 
             MeasurementBaseline baseline = new MeasurementBaseline();
             baseline.setMax(max);

^^ Not tested but maybe will work

--- Additional comment from Stefan Negrea on 2013-11-07 10:33:32 EST ---

Elias,

Your proposed fix would address something that is not possible. All aggregates have min, max and average; so adding that statement there only prevents cases where data was deleted maliciously from the storage nodes by users and not RHQ.


Do you have any other logs or data on how this is happening? Sample data or reproductions steps would greatly helps us fix the root cause of the issue.

Also, is this an upgrade from a prior version of RHQ or it is a fresh install?

--- Additional comment from Elias Ross on 2013-11-07 11:05:47 EST ---

Not possible?

The original bug was after an upgrade of a test cluster, so maybe some old data is a problem. I've recently seen the problem after my main server upgrade today, which was nearly a week ago.

I've had problems with my Cassandra cluster, where data has been coming up empty. Meaning, you query it and the expected data is not there. Could this cause an issue? I've seen 0 avg, NaN on min/max in the UI, for the metrics table. So there are NaNs someplace.

Even if my fix fixes something that is not possible, it fixes something I've seen on two servers. I've had to disable baseline calculation.
Comment 1 Stefan Negrea 2013-11-07 15:54:05 EST
Elias, thanks for reporting the issue and a fix. There are two scenario that could lead to the problem you've reported; the data for a schedule is not completely persisted when the query for baselines runs or the data is not complete (delete or missing entries). I applied your proposed fix along with a comment. By returning null, the baseline calculation for the particular schedule will be postponed until the next data purge run.


release/jon3.2.x branch commit:

https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=acbc27a5372c55a950d79fe55c60c12e21bab16d
Comment 2 Simeon Pinder 2013-11-19 10:48:39 EST
Moving to ON_QA as available for testing with new brew build.
Comment 3 Simeon Pinder 2013-11-22 00:14:05 EST
Mass moving all of these from ER6 to target milestone ER07 since the ER6 build was bad and QE was halted for the same reason.

Note You need to log in before you can comment on or make changes to this bug.