Bug 535067 - (RHQ-1801) "rhq_meas_data_num_r07_id_t_pk" constraint violations occuring on (some) measurement(s) [NEEDINFO]
"rhq_meas_data_num_r07_id_t_pk" constraint violations occuring on (some) meas...
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Core Server (Show other bugs)
4.3
All All
medium Severity high (vote)
: ---
: ---
Assigned To: RHQ Project Maintainer
Mike Foley
http://jira.rhq-project.org/browse/RH...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-17 21:40 EDT by Corey Welton
Modified: 2014-05-02 13:49 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-05-02 13:49:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
cwelton: needinfo? (jmarques)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Legacy) 32633 None None None Never

  None (edit)
Description Corey Welton 2009-03-17 21:40:00 EDT
I'm seeing a message similar to the following show up repeatedly in my server logs for the latest build.

2009-03-17 21:33:41,094 WARN  [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement numeric data:
java.sql.BatchUpdateException:Batch entry 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R07(schedule_id,time_stamp,value) VALUES(46345,1237339981177,1.040216064E9) was aborted.  Call getNextException to see the cause.[SQLException=Batch entry 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R07(schedule_id,time_stamp,value) VALUES(46345,1237339981177,1.040216064E9) was aborted.  Call getNextException to see the cause. -> ERROR: duplicate key violates unique constraint "rhq_meas_data_num_r07_id_t_pk"(error-code=0,sql-state=23505)]
Comment 1 Joseph Marques 2009-03-19 13:13:30 EDT
from heiko:

"I did see that a very few times in the past too. I think shortly after an agent was reconnecting (to a downed server?).  Perhaps there is a race condition when shuting down the agent so that data ends up being sent to the server and as in this process (perhaps before we hear the ack back from the server, the sender is shut down) the agent is spooling the data to disk. Now we have one commited copy on the server and a spooled one on the agent."
Comment 2 Joseph Marques 2009-03-19 13:14:49 EDT
heiko, that makes a lot of sense, and i agree there's a possibility of that occurring.  as a result, this exception should happen rarely, but even if it does happen it should resolve itself quickly assuming the connection between the agent and the server is stable.  corey, was this a repeating exception, or a one-time issue?
Comment 3 Joseph Marques 2009-03-19 13:15:14 EDT
corey, see my last comment.
Comment 4 Corey Welton 2009-03-19 13:20:24 EDT
Joseph, 

I don't have those logs now (I don't think), but I saw it multiple times.  I did something like a 

grep "rhq_meas_data_num_r07_id_t_pk" rhq-server-log4j.log

And saw it show up a good dozen/dozen and a half times or so.
Comment 5 Joseph Marques 2009-03-21 06:59:58 EDT
corey, how many agents did you have attached to the system?
Comment 6 Corey Welton 2009-03-21 09:24:48 EDT
If you mean this particular server at the time, I don't know.  There are three platforms in the HA cluster, each of which had an agent running, and /usually/ connected to a server running on the same box.  

Occasionally I would get some agents switching over to another box, but I don't know which agent was connected to which server, at any point in time.
Comment 7 Joseph Marques 2009-03-27 04:18:49 EDT
i'm going to keep one eye open for this issue in the future, but in the worse case scenario this is due to some corner case in the workflow logic for reporting measurements that we haven't handled properly yet.  the point is, if we try to insert some measurement report twice, we want that fail to in order to keep the data consistent with the business rules - perhaps we shouldn't throw a big, nasty stack in this case (or maybe only throw it when the server is in debug mode).
Comment 8 Heiko W. Rupp 2009-03-31 04:53:25 EDT
Just saw the following here:

Agent has connected. Then there was some hickup, so that the agent got disconnected and it tried again.
It looks like the agent was able to deliver some metrics before it got dropped, but as they were not acknowledged, they got send again after the next connect.


10:30:08,083 INFO  [CoreServerServiceImpl] Agent [snert] has connected to this server at Tue Mar 31 10:30:08 CEST 2009
10:31:20,367 ERROR [ServerThread] Worker thread initialization failure
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2196)
        at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2376)
        at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2443)
        at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2515)
        at java.io.ObjectInputStream.read(ObjectInputStream.java:800)
        at org.jboss.remoting.transport.socket.ServerThread.readVersion(ServerThread.java:810)
        at org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:506)
        at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:369)
        at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:165)
10:31:21,680 INFO  [CoreServerServiceImpl] Agent [snert][1.2.0-SNAPSHOT(0)] would like to connect to this server
10:31:21,893 INFO  [CoreServerServiceImpl] Agent [snert] has connected to this server at Tue Mar 31 10:31:21 CEST 2009
10:31:22,276 WARN  [TxConnectionManager] Connection error occured: org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@b72301[state=NORMAL mc=org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@4d9f99 handles=1 lastUse=1238488281893 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$OnePool@d649bf context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@83bce1 xaResource=org.jboss.resource.connectionmanager.xa.JcaXAResourceWrapper@6ba2e0 txSync=null]
java.sql.BatchUpdateException: Batch-Eintrag 0 INSERT  /*+ APPEND */ INTO RHQ_MEAS_DATA_NUM_R03(schedule_id,time_stamp,value) VALUES(10016,1238488237895,3.000050000833347) wurde abgebrochen.  Rufen Sie 'getNextException' auf, um die Ursache zu erfahren.
        at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2531)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1344)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:343)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2668)
        at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.postgresql.ds.jdbc23.AbstractJdbc23PooledConnection$StatementHandler.invoke(AbstractJdbc23PooledConnection.java:477)
        at $Proxy48.executeBatch(Unknown Source)
        at org.jboss.resource.adapter.jdbc.CachedPreparedStatement.executeBatch(CachedPreparedStatement.java:478)
        at org.jboss.resource.adapter.jdbc.WrappedStatement.executeBatch(WrappedStatement.java:519)
        at org.rhq.enterprise.server.measurement.MeasurementDataManagerBean.addNumericData(MeasurementDataManagerBean.java:245)
        at sun.reflect.GeneratedMethodAccessor1386.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:112)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:166)
        at org.rhq.enterprise.server.common.TransactionInterruptInterceptor.addCheckedActionToTransactionManager(TransactionInterruptInterceptor.java:77)
        at sun.reflect.GeneratedMethodAccessor113.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118)
        at org.rhq.enterprise.server.authz.RequiredPermissionsInterceptor.checkRequiredPermissions(RequiredPermissionsInterceptor.java:153)
        at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118)
        at org.jboss.ejb3.interceptor.EJB3InterceptorsInterceptor.invoke(EJB3InterceptorsInterceptor.java:63)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.entity.TransactionScopedEntityManagerInterceptor.invoke(TransactionScopedEntityManagerInterceptor.java:54)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.AllowedOperationsInterceptor.invoke(AllowedOperationsInterceptor.java:47)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:79)
        at org.jboss.aspects.tx.TxInterceptor$RequiresNew.invoke(TxInterceptor.java:262)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.security.
Comment 9 Corey Welton 2009-04-01 14:14:31 EDT
Reducing severity of this issue since it seems that it might be mostly transparent to user.  Still ugly though.
Comment 10 Charles Crouch 2009-07-21 11:09:52 EDT
Similar exceptions from Oracle:

2009-07-15 19:44:33,533 WARN  [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement numeric data:
java.sql.BatchUpdateException:ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated
[SQLException=ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated
 -> ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_NUM_R06_ID_T_PK) violated
(error-code=1,sql-state=23000)]
2009-07-15 19:44:40,132 WARN  [org.rhq.enterprise.server.measurement.MeasurementDataManagerBean] Failure saving measurement trait data:
java.sql.BatchUpdateException:ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated
[SQLException=ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated
 -> ORA-00001: unique constraint (JON.RHQ_MEAS_DATA_TRAIT_ID_TIME_PK) violated
(error-code=1,sql-state=23000)]
Comment 11 Red Hat Bugzilla 2009-11-10 15:46:58 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1801
This bug relates to RHQ-2282
Comment 12 Larry O'Leary 2010-05-21 15:51:32 EDT
This issue appears to have a very bad side-affect of a server crash due to OutOfMemoryError exception.  In the event that the condition occurs multiple times in a very short period of time, it is believed that the transaction manager's recovery is eating up lot of valuable heap space during recovery.  The end result is that heap is exhausted and the server is left in a bad state.

Note You need to log in before you can comment on or make changes to this bug.