Bug 535352 (RHQ-2057)

Summary:	runaway avail reporting
Product:	[Other] RHQ Project	Reporter:	John Mazzitelli <mazz>
Component:	Agent	Assignee:	RHQ Project Maintainer <rhq-maint>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	medium	Docs Contact:
Priority:	high
Version:	unspecified	CC:	cwelton, fredrik.joakim.sandstrom, jshaughn
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	All
URL:	http://jira.rhq-project.org/browse/RHQ-2057
Whiteboard:
Fixed In Version:	1.4	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-02-28 19:47:26 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	534375

Description John Mazzitelli 2009-05-05 01:48:00 UTC

I'm seeing the agent send availability reports one after the other in quick succession for no apparent reason:

21:40:39,568 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (640)ms
21:40:43,006 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (656)ms
21:40:45,896 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (687)ms
21:40:49,318 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (1328)ms
21:40:52,006 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (688)ms

This repeats itself  - looks like an infinite loop.

I'm also seeing this:

2009-05-04 02:44:59,709 ERROR [org.hibernate.jdbc.AbstractBatcher] Exception executing batch: 
org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
	at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
	at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
	at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
	at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
	at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:247)
	at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:237)
	at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:146)
	at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
	at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
	at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
	at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:338)
	at org.hibernate.ejb.AbstractEntityManagerImpl$1.beforeCompletion(AbstractEntityManagerImpl.java:515)
	at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:114)
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:247)
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:86)
	at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:177)
	at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1382)
	at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:135)
	at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:87)
	at org.jboss.aspects.tx.TxPolicy.endTransaction(TxPolicy.java:175)
	at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:87)
	at org.jboss.aspects.tx.TxInterceptor$Required.invoke(TxInterceptor.java:191)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.aspects.security.AuthenticationInterceptor.invoke(AuthenticationInterceptor.java:77)
	at org.jboss.ejb3.security.Ejb3AuthenticationInterceptor.invoke(Ejb3AuthenticationInterceptor.java:106)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:46)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.asynchronous.AsynchronousInterceptor.invoke(AsynchronousInterceptor.java:106)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:214)
	at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:184)
	at org.jboss.ejb3.stateless.StatelessLocalProxy.invoke(StatelessLocalProxy.java:81)
	at $Proxy290.addResourceError(Unknown Source)
	at org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl.setResourceError(DiscoveryServerServiceImpl.java:148)
	at sun.reflect.GeneratedMethodAccessor228.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.rhq.enterprise.communications.command.impl.remotepojo.server.RemotePojoInvocationCommandService.execute(RemotePojoInvocationCommandService.java:184)
	at sun.reflect.GeneratedMethodAccessor220.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
	at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
	at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
	at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
	at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
	at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288)
	at $Proxy412.execute(Unknown Source)
	at org.rhq.enterprise.communications.command.server.CommandProcessor.handleIncomingInvocationRequest(CommandProcessor.java:290)
	at org.rhq.enterprise.communications.command.server.CommandProcessor.invoke(CommandProcessor.java:184)
	at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:734)
	at org.jboss.remoting.transport.servlet.ServletServerInvoker.processRequest(ServletServerInvoker.java:231)
	at sun.reflect.GeneratedMethodAccessor216.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
	at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
	at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
	at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
	at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
	at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288)
	at $Proxy372.processRequest(Unknown Source)
	at org.jboss.remoting.transport.servlet.web.ServerInvokerServlet.processRequest(ServerInvokerServlet.java:128)
	at org.jboss.remoting.transport.servlet.web.ServerInvokerServlet.doPost(ServerInvokerServlet.java:157)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.rhq.helpers.rtfilter.filter.RtFilter.doFilter(RtFilter.java:124)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
	at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:179)
	at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
	at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:241)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:580)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

Comment 1 John Mazzitelli 2009-05-05 02:26:57 UTC

seeing this in cmd trace:

2009-05-04 22:25:24,318 TRACE {send.complete}=>>DiscoveryServerService.setResourceError|{rhq.agent-name=mazzthink, rhq.retry=1, rhq.security-token=1241245112171-1980743266-8190587527248950016, rhq.send-throttle=true, rhq.guaranteed-delivery=true}|failed:java.lang.reflect.InvocationTargetException:null -> org.rhq.core.util.exception.WrappedRemotingException:javax.persistence.PersistenceException: org.hibernate.PropertyValueException: not-null property references a null or transient value: org.rhq.core.domain.resource.ResourceError.summary -> org.rhq.core.util.exception.WrappedRemotingException:org.hibernate.PropertyValueException: not-null property references a null or transient value: org.rhq.core.domain.resource.ResourceError.summary -> org.rhq.core.util.exception.WrappedRemotingException:not-null property references a null or transient value: org.rhq.core.domain.resource.ResourceError.summary

I have it where the agent goes into an infinite loop constantly sending avai reports... I don't know why yet.

Comment 2 John Mazzitelli 2009-05-05 03:15:59 UTC

This sendResourceErrorToServer is getting called every 1 second or so. I do have two Embedded Tomcat connector resources that are failing their avialability checks. but why is this constantly calling setRersourceError - the resource errors are getting stored in the DB (I checked) and the agent isn't getting exceptions from server.

Daemon Thread [InventoryManager.availability-1] (Suspended (breakpoint at line 1192 in org.rhq.core.pc.inventory.InventoryManager))	
	owns: java.lang.Object  (id=7013)	
	org.rhq.core.pc.inventory.InventoryManager.sendResourceErrorToServer(org.rhq.core.domain.resource.ResourceError) line: 1192	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 178	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 210	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 210	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 210	
	org.rhq.core.pc.inventory.AvailabilityExecutor.call() line: 106	
	org.rhq.core.pc.inventory.AvailabilityExecutor.run() line: 70	
	java.util.concurrent.Executors$RunnableAdapter<T>.call() line: 441	
	java.util.concurrent.FutureTask$Sync.innerRun() line: 303	
	java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() line: 138	
	java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) line: 98	
	java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 207	
	java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) line: 885	
	java.util.concurrent.ThreadPoolExecutor$Worker.run() line: 907	
	java.lang.Thread.run() line: 619

Comment 3 John Mazzitelli 2009-05-05 04:59:59 UTC

I think the root cause is this:

id	uuid	name	resource_key	parent_resource_id
500639	92f9a2a2-df5a-4673-a251-aac93f89a00c	Tomcat Connector  (127.0.0.1:8009)	jboss.web:address=127.0.0.1,port=8009,type=Connector	501246

500640	92f9a2a2-df5a-4673-a251-aac93f89a00c	Tomcat Connector  (127.0.0.1:8009)	jboss.web:address=127.0.0.1,port=8009,type=Connector	501246

500641	b71e8fb5-c050-4ce5-be4d-a694a5c32c06	Tomcat Connector  (127.0.0.1:8080)	jboss.web:address=127.0.0.1,port=8080,type=Connector	501246

500642	b71e8fb5-c050-4ce5-be4d-a694a5c32c06	Tomcat Connector  (127.0.0.1:8080)	jboss.web:address=127.0.0.1,port=8080,type=Connector	501246

resource IDs 39,40 are dups - as is 41,42. Some how, we inventoried the same resources twice.

Then what happens is in InventoryManager.synchInventory:

            if (!syncedResources.isEmpty() || !unknownResourceIds.isEmpty() || !modifiedResourceIds.isEmpty()) {
                performAvailabilityChecks(true);
                this.inventoryThreadPoolExecutor.schedule((Callable<? extends Object>) this.serviceScanExecutor, 5,
                    TimeUnit.SECONDS);
            }

the modifiedResourceIds is never empty - first it has two elements of 500639, 500641 - then it comes back within a second and it has 500640,500642 and then it comes back and repeats. It always thinks its needs to perform new avail checks and service scans - which puts us in an infinite loop (we scan, avail check, then scan avail check, scan....and so on)

Comment 4 John Mazzitelli 2009-05-05 05:29:49 UTC

i think we need a unique constraint on the uuid column in rhq_resource - at the very least it would have caught this and not allowed the data to get into the DB (and its those dup rows that caused the agent to wig out). 

I manually deleted the 2 dup rows from the database and the agent is acting fine now... no infinite loop

Comment 5 John Mazzitelli 2009-05-05 13:02:09 UTC

I did the ?debug=1 trick on the resource hub so I can uninventory those dup services. Now, every 15 minutes, I'm seeing this in teh server log:

08:07:11,537 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:07:11,803 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:07:11 EDT 2009
08:22:16,943 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:22:17,115 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:22:17 EDT 2009
08:37:22,162 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:37:22,287 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:37:22 EDT 2009
08:52:28,646 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:52:29,068 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:52:29 EDT 2009

Comment 6 John Mazzitelli 2009-05-05 13:56:51 UTC

that last comment may be a red herring - currently debugging this and this is getting caused by the failover code - my box recently lost its VPN and then I reconnected to the VPN, I have a feeling this is causing the problem. I see CannotConnectExceptions when trying to send up inventory and meas. reports, probably due to DNS and IP changes when going on and off VPN

Comment 7 John Mazzitelli 2009-05-07 04:45:45 UTC

running for a while again, and I saw my avails running away again. so I checked to see if I have dups:

id	name	agent_id	parent_resource_id	uuid	resource_key
502550	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	c9c201a3-ea1a-4827-9cdf-846255157f28	C:\mazz\source\rhq\trunk\modules\enterprise\server\container\target\rhq-server-1.3.0-SNAPSHOT\jbossas\server\default
502551	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	c9c201a3-ea1a-4827-9cdf-846255157f28	C:\mazz\source\rhq\trunk\modules\enterprise\server\container\target\rhq-server-1.3.0-SNAPSHOT\jbossas\server\default
502552	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	c9c201a3-ea1a-4827-9cdf-846255157f28	C:\mazz\source\rhq\trunk\modules\enterprise\server\container\target\rhq-server-1.3.0-SNAPSHOT\jbossas\server\default

502553	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	95b5bc13-dbf0-46c1-b610-f86b02719058	C:\mazz\source\jopr\trunk\modules\dist\server\target\jopr-server-2.3.0-SNAPSHOT\jbossas\server\default
502554	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	95b5bc13-dbf0-46c1-b610-f86b02719058	C:\mazz\source\jopr\trunk\modules\dist\server\target\jopr-server-2.3.0-SNAPSHOT\jbossas\server\default

now, I know for sure that I had network outages (my comcast went down several times today - not long, but 1 minute here, 10 minutes there, about 3 times or so today). This means that I lost my VPN. I'm not sure how that would cause this problem of duplicate resources, but the two times this has happened to me, I did lose my VPN connection.

Comment 8 John Mazzitelli 2009-05-11 13:48:28 UTC

This is the SQL you can use to find out if you have any duplicate resources:

select * from rhq_resource a, rhq_resource b where a.id != b.id and a.uuid = b.uuid

Comment 9 John Mazzitelli 2009-05-20 20:33:35 UTC

I'm getting this again, but I do not have dup resources.

I compeltely deleted my /data directory, started with -l and I still get the problem.

Comment 10 John Mazzitelli 2009-05-20 20:51:44 UTC

BTW: I saw this happen when I shutdown the agent, deleted the agent install dir, and re-installed a new agent but kept the agent config prefs intact.

Comment 11 John Mazzitelli 2009-05-20 21:16:47 UTC

does:

                    boolean ok = configuration.getServerServices().getDiscoveryServerService().mergeAvailabilityReport(
                        report);

turn around and ask for another avail report to be sent? I hit this in a debugger, tried to step over while the VM was paused and it never returned. Once I restarted all the VM threads, everthing kept going but it kept looping

Comment 12 John Mazzitelli 2009-05-21 06:10:01 UTC

put a breakpoint in InventoryManager.syncInventory - it keeps getting called.

I execute this while in this method:

System.out.print(this.resourceContainers.get("783cb684-ce28-4316-a657-ed32cc432a40"));
System.out.print(this.resourceContainers.get("582ff136-024d-4067-8f9e-032ff78cbc06"));

and look what comes out - notice the UUIDs are different but the IDs are the same!!!  Two resources, same ID in the agent inventory:

ResourceContainer: resource=[Resource[id=500067, type=Java Service Wrapper Launcher, key=jsw, name=RHQ Agent Java Service Wrapper, parent=mazzthink RHQ Agent, version=1.3.0-SNAPSHOT]]

ResourceContainer: resource=[Resource[id=500067, type=Datasource, key=jboss.jca:name=RHQDS,service=DataSourceBinding, name=RHQDS Datasource, parent=mazzthink RHQ Server,  JBossAS 4.2.3.GA default (0.0.0.0:2099)]]

Comment 13 John Mazzitelli 2009-05-21 06:12:24 UTC

inventory -x shows this:

            <resource>
               <id>500067</id>
               <key>jboss.jca:name=RHQDS,service=DataSourceBinding</key>
               <name>RHQDS Datasource</name>
               <version></version>
               <uuid>582ff136-024d-4067-8f9e-032ff78cbc06</uuid>

but database shows this:

500067	783cb684-ce28-4316-a657-ed32cc432a40	RHQ Agent Java Service Wrapper

somehow the agent sync'ed incorrectly - the DB says 500067 is the JSW, but the agent things that ID is for the datasource.

this is the root cause of the avail looping because syncInventory goes batty and keeps getting called because the agent things it isn't fully sync'ed

Comment 14 John Mazzitelli 2009-05-21 13:30:53 UTC

i was using an old inventory.dat to try to get teh agent to be out of sync, but I think that was the problem. I had a bogus test. this may not be a problem.

i'm leaving this jira open just in case i see this under different circumstances.

Comment 15 John Mazzitelli 2009-05-21 13:36:48 UTC

the problem with the duplicate resources still needs to be fixed.

Comment 16 Corey Welton 2009-08-05 18:03:48 UTC

Push to  1.4

Comment 17 Charles Crouch 2009-09-11 14:37:46 UTC

I'm seeing this again in the GSS environment (a QA build RHQ r5013)

..
2009-09-11 09:15:55,144 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerS
erviceImpl] Processed AV:[vm138.gsslab.rdu.redhat.com][348][full] - need full=[fals
e] in (156)ms
2009-09-11 09:15:56,786 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerS
erviceImpl] Processed AV:[vm138.gsslab.rdu.redhat.com][348][full] - need full=[fals
e] in (147)ms
...

I started seeing this after I restarted the agent with -u (previously I was seeing my JBAS instance show red when it was infact up)
I checked for dupe resources using the SQL and saw none.

Restarting the agent with --clean seems to have resolved the multiple avail reports.

------------------

After some more thought I think this can probably be explained by some nasty clock issues.
The fast avail reporting seem to start around mindnight last night. Yesterday afternoon the clocks were reset using ntpd underneath the feet of the agent and server, from around midnight back to 4pm:

(3:46:00 PM) ccrouch: [gss@vm138 bin]$ date
(3:46:00 PM) ccrouch: Fri Sep 11 00:30:33 EDT 2009
(3:46:00 PM) ccrouch: [gss@vm138 bin]$ date
(3:46:00 PM) ccrouch: Thu Sep 10 16:46:02 EDT 2009

I bet when we came across old date with a timestamp in the future we got confused :-/

Comment 18 Red Hat Bugzilla 2009-11-10 20:56:52 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2057
This bug relates to RHQ-1885

Comment 19 Corey Welton 2010-08-30 17:33:30 UTC

 mazz - what's left here?  can we revisit unique restraint solution?

Comment 20 John Mazzitelli 2010-08-30 18:02:13 UTC

we've done lots of changes to inventory sync since this was created. I would close this as this problem hasn't been seen in a while.

That said, this might still be something we should consider:

"i think we need a unique constraint on the uuid column in rhq_resource - at the
very least it would have caught this and not allowed the data to get into the
DB (and its those dup rows that caused the agent to wig out). "

The UUIDs should be unique across the entire inventory.

This would require a change to dbsetup/dbupgrade scripts (or liquibase if we ever move to that)

Comment 21 John Mazzitelli 2011-03-15 04:58:39 UTC

FYI: the hibernate errors during calls to setResourceError are being fixed via bug 664126

Comment 22 Jay Shaughnessy 2012-02-28 19:47:26 UTC

I'm closing this since it's been stale for a while and we have massive
amounts of refactored/change code in the area of question.