I have no idea if this is really a problem or not, but I wanted to document it anyway. I rebuilt my dev container, and restarted it after the build finished. I had two agents running at the time, so they just waited for the server to restart before being able to reconnect. They did seem to reconnect. However, I saw an exception get thrown several times before it stopped. I'll add the stack in a separate comment, but the problem was when the agent tried calling setResourceError.
2010-12-18 07:53:18,359 WARN [com.arjuna.ats.arjuna.logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator_2] TwoPhaseCoordinator.beforeCompletion - failed for com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple@6c1e9dc4 javax.persistence.OptimisticLockException: org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.ejb.AbstractEntityManagerImpl.wrapStaleStateException(AbstractEntityManagerImpl.java:653) at org.hibernate.ejb.AbstractEntityManagerImpl.throwPersistenceException(AbstractEntityManagerImpl.java:599) at org.hibernate.ejb.AbstractEntityManagerImpl$1.beforeCompletion(AbstractEntityManagerImpl.java:524) at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:114) at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:247) at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:86) at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:177) at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1389) at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:135) at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:87) at org.jboss.aspects.tx.TxPolicy.endTransaction(TxPolicy.java:175) at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:87) at org.jboss.aspects.tx.TxInterceptor$Required.invoke(TxInterceptor.java:191) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:95) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.aspects.security.AuthenticationInterceptor.invoke(AuthenticationInterceptor.java:77) at org.jboss.ejb3.security.Ejb3AuthenticationInterceptor.invoke(Ejb3AuthenticationInterceptor.java:110) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:46) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.ejb3.asynchronous.AsynchronousInterceptor.invoke(AsynchronousInterceptor.java:106) at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101) at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:240) at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:210) at org.jboss.ejb3.stateless.StatelessLocalProxy.invoke(StatelessLocalProxy.java:84) at $Proxy337.addResourceError(Unknown Source) at org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl.setResourceError(DiscoveryServerServiceImpl.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.rhq.enterprise.communications.command.impl.remotepojo.server.RemotePojoInvocationCommandService.execute(RemotePojoInvocationCommandService.java:184) at sun.reflect.GeneratedMethodAccessor281.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155) at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94) at org.jboss.mx.server.Invocation.invoke(Invocation.java:86) at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288) at $Proxy544.execute(Unknown Source) at org.rhq.enterprise.communications.command.server.CommandProcessor.handleIncomingInvocationRequest(CommandProcessor.java:290) at org.rhq.enterprise.communications.command.server.CommandProcessor.invoke(CommandProcessor.java:184) at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:809) at org.jboss.remoting.transport.servlet.ServletServerInvoker.processRequest(ServletServerInvoker.java:232) at sun.reflect.GeneratedMethodAccessor261.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155) at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94) at org.jboss.mx.server.Invocation.invoke(Invocation.java:86) at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288) at $Proxy426.processRequest(Unknown Source) at org.jboss.remoting.transport.servlet.web.ServerInvokerServlet.processRequest(ServerInvokerServlet.java:128) at org.jboss.remoting.transport.servlet.web.ServerInvokerServlet.doPost(ServerInvokerServlet.java:157) at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.rhq.helpers.rtfilter.filter.RtFilter.doFilter(RtFilter.java:124) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:182) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:262) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:446) at java.lang.Thread.run(Thread.java:619) Caused by: org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1 at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61) at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46) at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68) at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48) at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:247) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:237) at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:146) at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298) at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27) at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000) at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:338) at org.hibernate.ejb.AbstractEntityManagerImpl$1.beforeCompletion(AbstractEntityManagerImpl.java:515) ... 81 more
This is the RHQ API where it all began: org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl.setResourceError(DiscoveryServerServiceImpl.java:165)
*** Bug 612690 has been marked as a duplicate of this bug. ***
commit 29960f759d35ace3e8bac334ee49abafaefd2055 there were several concurrent calls being made to setResourceError. Needed to make sure things are done in their own transactions. The way I was able to replicate this: 1) on an agent box, start an EAP 4.3 instance and an RHQ Agent 2) import the EAP instance into inventory 3) shutdown everything: the RHQ Server, the RHQ Agent and the EAP instance 4) restart the RHQ Server 5) restart the RHQ Agent - DO NOT START THE EAP INSTANCE (it must remain down) 6) wait several minutes - look in the server logs and make sure there are no exceptions about hibernate failing in the setResourceError method. If you go 10 minutes or more without seeing those errors in the log, this is verified 7) you could double check this by shutting down the RHQ Agent and restarting it (again, keep the EAP instance down). Same thing as 6) - wait and see that no errors occur.
This is verified, RHQ4 Master ... by following the steps listed above.
Bulk closing of issues that were VERIFIED, had no target release and where the status changed more than a year ago.