I'm seeing a message similar to the following show up repeatedly in my server logs for the latest build. 2009-03-17 21:33:41,094 WARN [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement numeric data: java.sql.BatchUpdateException:Batch entry 0 INSERT /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R07(schedule_id,time_stamp,value) VALUES(46345,1237339981177,1.040216064E9) was aborted. Call getNextException to see the cause.[SQLException=Batch entry 0 INSERT /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R07(schedule_id,time_stamp,value) VALUES(46345,1237339981177,1.040216064E9) was aborted. Call getNextException to see the cause. -> ERROR: duplicate key violates unique constraint "rhq_meas_data_num_r07_id_t_pk"(error-code=0,sql-state=23505)]
from heiko: "I did see that a very few times in the past too. I think shortly after an agent was reconnecting (to a downed server?). Perhaps there is a race condition when shuting down the agent so that data ends up being sent to the server and as in this process (perhaps before we hear the ack back from the server, the sender is shut down) the agent is spooling the data to disk. Now we have one commited copy on the server and a spooled one on the agent."
heiko, that makes a lot of sense, and i agree there's a possibility of that occurring. as a result, this exception should happen rarely, but even if it does happen it should resolve itself quickly assuming the connection between the agent and the server is stable. corey, was this a repeating exception, or a one-time issue?
corey, see my last comment.
Joseph, I don't have those logs now (I don't think), but I saw it multiple times. I did something like a grep "rhq_meas_data_num_r07_id_t_pk" rhq-server-log4j.log And saw it show up a good dozen/dozen and a half times or so.
corey, how many agents did you have attached to the system?
If you mean this particular server at the time, I don't know. There are three platforms in the HA cluster, each of which had an agent running, and /usually/ connected to a server running on the same box. Occasionally I would get some agents switching over to another box, but I don't know which agent was connected to which server, at any point in time.
i'm going to keep one eye open for this issue in the future, but in the worse case scenario this is due to some corner case in the workflow logic for reporting measurements that we haven't handled properly yet. the point is, if we try to insert some measurement report twice, we want that fail to in order to keep the data consistent with the business rules - perhaps we shouldn't throw a big, nasty stack in this case (or maybe only throw it when the server is in debug mode).
Just saw the following here: Agent has connected. Then there was some hickup, so that the agent got disconnected and it tried again. It looks like the agent was able to deliver some metrics before it got dropped, but as they were not acknowledged, they got send again after the next connect. 10:30:08,083 INFO [CoreServerServiceImpl] Agent [snert] has connected to this server at Tue Mar 31 10:30:08 CEST 2009 10:31:20,367 ERROR [ServerThread] Worker thread initialization failure java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:235) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2196) at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2376) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2443) at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2515) at java.io.ObjectInputStream.read(ObjectInputStream.java:800) at org.jboss.remoting.transport.socket.ServerThread.readVersion(ServerThread.java:810) at org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:506) at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:369) at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:165) 10:31:21,680 INFO [CoreServerServiceImpl] Agent [snert][1.2.0-SNAPSHOT(0)] would like to connect to this server 10:31:21,893 INFO [CoreServerServiceImpl] Agent [snert] has connected to this server at Tue Mar 31 10:31:21 CEST 2009 10:31:22,276 WARN [TxConnectionManager] Connection error occured: org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@b72301[state=NORMAL mc=org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@4d9f99 handles=1 lastUse=1238488281893 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$OnePool@d649bf context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@83bce1 xaResource=org.jboss.resource.connectionmanager.xa.JcaXAResourceWrapper@6ba2e0 txSync=null] java.sql.BatchUpdateException: Batch-Eintrag 0 INSERT /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R03(schedule_id,time_stamp,value) VALUES(10016,1238488237895,3.000050000833347) wurde abgebrochen. Rufen Sie 'getNextException' auf, um die Ursache zu erfahren. at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2531) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1344) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:343) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2668) at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.postgresql.ds.jdbc23.AbstractJdbc23PooledConnection$StatementHandler.invoke(AbstractJdbc23PooledConnection.java:477) at $Proxy48.executeBatch(Unknown Source) at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeBatch(CachedPreparedStatement.java:478) at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:519) at org.rhq.enterprise.server.measurement.MeasurementDataManagerBean.addNumericData(MeasurementDataManagerBean.java:245) at sun.reflect.GeneratedMethodAccessor1386.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:112) at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:166) at org.rhq.enterprise.server.common.TransactionInterruptInterceptor.addCheckedActionToTransactionManager(TransactionInterruptInterceptor.java:77) at sun.reflect.GeneratedMethodAccessor113.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118) at org.rhq.enterprise.server.authz.RequiredPermissionsInterceptor.checkRequiredPermissions(RequiredPermissionsInterceptor.java:153) at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118) at org.jboss.ejb3.interceptor.EJB3InterceptorsInterceptor.invoke(EJB3InterceptorsInterceptor.java:63) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.ejb3.entity.TransactionScopedEntityManagerInterceptor.invoke(TransactionScopedEntityManagerInterceptor.java:54) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.ejb3.AllowedOperationsInterceptor.invoke(AllowedOperationsInterceptor.java:47) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:79) at org.jboss.aspects.tx.TxInterceptor$RequiresNew.invoke(TxInterceptor.java:262) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.aspects.security.
Reducing severity of this issue since it seems that it might be mostly transparent to user. Still ugly though.
Similar exceptions from Oracle: 2009-07-15 19:44:33,533 WARN [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement numeric data: java.sql.BatchUpdateException:ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated [SQLException=ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated -> ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated (error-code=1,sql-state=23000)] 2009-07-15 19:44:40,132 WARN [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement trait data: java.sql.BatchUpdateException:ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated [SQLException=ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated -> ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated (error-code=1,sql-state=23000)]
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1801 This bug relates to RHQ-2282
This issue appears to have a very bad side-affect of a server crash due to OutOfMemoryError exception. In the event that the condition occurs multiple times in a very short period of time, it is believed that the transaction manager's recovery is eating up lot of valuable heap space during recovery. The end result is that heap is exhausted and the server is left in a bad state.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days