535067 – (RHQ-1801) "rhq_meas_data_num_r07_id_t_pk" constraint violations occuring on (some) measurement(s)

Bug 535067 (RHQ-1801) - "rhq_meas_data_num_r07_id_t_pk" constraint violations occuring on (some) measurement(s)

Summary: "rhq_meas_data_num_r07_id_t_pk" constraint violations occuring on (some) meas...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	RHQ-1801
Product:	RHQ Project
Classification:	Other
Component:	Core Server
Sub Component:
Version:	4.3
Hardware:	All
OS:	All
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	RHQ Project Maintainer
QA Contact:	Mike Foley
Docs Contact:
URL:	http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-03-18 01:40 UTC by Corey Welton
Modified:	2023-09-14 01:18 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-05-02 17:49:07 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Legacy)	32633	0	None	None	None	Never

Description Corey Welton 2009-03-18 01:40:00 UTC

I'm seeing a message similar to the following show up repeatedly in my server logs for the latest build.

2009-03-17 21:33:41,094 WARN  [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement numeric data:
java.sql.BatchUpdateException:Batch entry 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R07(schedule_id,time_stamp,value) VALUES(46345,1237339981177,1.040216064E9) was aborted.  Call getNextException to see the cause.[SQLException=Batch entry 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R07(schedule_id,time_stamp,value) VALUES(46345,1237339981177,1.040216064E9) was aborted.  Call getNextException to see the cause. -> ERROR: duplicate key violates unique constraint "rhq_meas_data_num_r07_id_t_pk"(error-code=0,sql-state=23505)]

Comment 1 Joseph Marques 2009-03-19 17:13:30 UTC

from heiko:

"I did see that a very few times in the past too. I think shortly after an agent was reconnecting (to a downed server?).  Perhaps there is a race condition when shuting down the agent so that data ends up being sent to the server and as in this process (perhaps before we hear the ack back from the server, the sender is shut down) the agent is spooling the data to disk. Now we have one commited copy on the server and a spooled one on the agent."

Comment 2 Joseph Marques 2009-03-19 17:14:49 UTC

heiko, that makes a lot of sense, and i agree there's a possibility of that occurring.  as a result, this exception should happen rarely, but even if it does happen it should resolve itself quickly assuming the connection between the agent and the server is stable.  corey, was this a repeating exception, or a one-time issue?

Comment 3 Joseph Marques 2009-03-19 17:15:14 UTC

corey, see my last comment.

Comment 4 Corey Welton 2009-03-19 17:20:24 UTC

Joseph, 

I don't have those logs now (I don't think), but I saw it multiple times.  I did something like a 

grep "rhq_meas_data_num_r07_id_t_pk" rhq-server-log4j.log

And saw it show up a good dozen/dozen and a half times or so.

Comment 5 Joseph Marques 2009-03-21 10:59:58 UTC

corey, how many agents did you have attached to the system?

Comment 6 Corey Welton 2009-03-21 13:24:48 UTC

If you mean this particular server at the time, I don't know.  There are three platforms in the HA cluster, each of which had an agent running, and /usually/ connected to a server running on the same box.  

Occasionally I would get some agents switching over to another box, but I don't know which agent was connected to which server, at any point in time.

Comment 7 Joseph Marques 2009-03-27 08:18:49 UTC

i'm going to keep one eye open for this issue in the future, but in the worse case scenario this is due to some corner case in the workflow logic for reporting measurements that we haven't handled properly yet.  the point is, if we try to insert some measurement report twice, we want that fail to in order to keep the data consistent with the business rules - perhaps we shouldn't throw a big, nasty stack in this case (or maybe only throw it when the server is in debug mode).

Comment 8 Heiko W. Rupp 2009-03-31 08:53:25 UTC

Just saw the following here:

Agent has connected. Then there was some hickup, so that the agent got disconnected and it tried again.
It looks like the agent was able to deliver some metrics before it got dropped, but as they were not acknowledged, they got send again after the next connect.


10:30:08,083 INFO  [CoreServerServiceImpl] Agent [snert] has connected to this server at Tue Mar 31 10:30:08 CEST 2009
10:31:20,367 ERROR [ServerThread] Worker thread initialization failure
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2196)
        at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2376)
        at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2443)
        at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2515)
        at java.io.ObjectInputStream.read(ObjectInputStream.java:800)
        at org.jboss.remoting.transport.socket.ServerThread.readVersion(ServerThread.java:810)
        at org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:506)
        at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:369)
        at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:165)
10:31:21,680 INFO  [CoreServerServiceImpl] Agent [snert][1.2.0-SNAPSHOT(0)] would like to connect to this server
10:31:21,893 INFO  [CoreServerServiceImpl] Agent [snert] has connected to this server at Tue Mar 31 10:31:21 CEST 2009
10:31:22,276 WARN  [TxConnectionManager] Connection error occured: org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@b72301[state=NORMAL mc=org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@4d9f99 handles=1 lastUse=1238488281893 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$OnePool@d649bf context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@83bce1 xaResource=org.jboss.resource.connectionmanager.xa.JcaXAResourceWrapper@6ba2e0 txSync=null]
java.sql.BatchUpdateException: Batch-Eintrag 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R03(schedule_id,time_stamp,value) VALUES(10016,1238488237895,3.000050000833347) wurde abgebrochen.  Rufen Sie 'getNextException' auf, um die Ursache zu erfahren.
        at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2531)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1344)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:343)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2668)
        at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.postgresql.ds.jdbc23.AbstractJdbc23PooledConnection$StatementHandler.invoke(AbstractJdbc23PooledConnection.java:477)
        at $Proxy48.executeBatch(Unknown Source)
        at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeBatch(CachedPreparedStatement.java:478)
        at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:519)
        at org.rhq.enterprise.server.measurement.MeasurementDataManagerBean.addNumericData(MeasurementDataManagerBean.java:245)
        at sun.reflect.GeneratedMethodAccessor1386.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:112)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:166)
        at org.rhq.enterprise.server.common.TransactionInterruptInterceptor.addCheckedActionToTransactionManager(TransactionInterruptInterceptor.java:77)
        at sun.reflect.GeneratedMethodAccessor113.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118)
        at org.rhq.enterprise.server.authz.RequiredPermissionsInterceptor.checkRequiredPermissions(RequiredPermissionsInterceptor.java:153)
        at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118)
        at org.jboss.ejb3.interceptor.EJB3InterceptorsInterceptor.invoke(EJB3InterceptorsInterceptor.java:63)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.entity.TransactionScopedEntityManagerInterceptor.invoke(TransactionScopedEntityManagerInterceptor.java:54)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.AllowedOperationsInterceptor.invoke(AllowedOperationsInterceptor.java:47)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:79)
        at org.jboss.aspects.tx.TxInterceptor$RequiresNew.invoke(TxInterceptor.java:262)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.security.

Comment 9 Corey Welton 2009-04-01 18:14:31 UTC

Reducing severity of this issue since it seems that it might be mostly transparent to user.  Still ugly though.

Comment 10 Charles Crouch 2009-07-21 15:09:52 UTC

Similar exceptions from Oracle:

2009-07-15 19:44:33,533 WARN  [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement numeric data:
java.sql.BatchUpdateException:ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated
[SQLException=ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated
 -> ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated
(error-code=1,sql-state=23000)]
2009-07-15 19:44:40,132 WARN  [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement trait data:
java.sql.BatchUpdateException:ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated
[SQLException=ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated
 -> ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated
(error-code=1,sql-state=23000)]

Comment 11 Red Hat Bugzilla 2009-11-10 20:46:58 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1801
This bug relates to RHQ-2282

Comment 12 Larry O'Leary 2010-05-21 19:51:32 UTC

This issue appears to have a very bad side-affect of a server crash due to OutOfMemoryError exception.  In the event that the condition occurs multiple times in a very short period of time, it is believed that the transaction manager's recovery is eating up lot of valuable heap space during recovery.  The end result is that heap is exhausted and the server is left in a bad state.

Comment 13 Red Hat Bugzilla 2023-09-14 01:18:41 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.