Bug 535352 - (RHQ-2057) runaway avail reporting
runaway avail reporting
Status: CLOSED WORKSFORME
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
unspecified
All All
high Severity medium (vote)
: ---
: ---
Assigned To: RHQ Project Maintainer
http://jira.rhq-project.org/browse/RH...
:
Depends On:
Blocks: RHQ-1178
  Show dependency treegraph
 
Reported: 2009-05-04 21:48 EDT by John Mazzitelli
Modified: 2012-02-28 14:47 EST (History)
3 users (show)

See Also:
Fixed In Version: 1.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-02-28 14:47:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description John Mazzitelli 2009-05-04 21:48:00 EDT
I'm seeing the agent send availability reports one after the other in quick succession for no apparent reason:

21:40:39,568 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (640)ms
21:40:43,006 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (656)ms
21:40:45,896 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (687)ms
21:40:49,318 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (1328)ms
21:40:52,006 INFO  [DiscoveryServerServiceImpl] Processed AV:[mazzthink][287][full] - need full=[false] in (688)ms

This repeats itself  - looks like an infinite loop.

I'm also seeing this:

2009-05-04 02:44:59,709 ERROR [org.hibernate.jdbc.AbstractBatcher] Exception executing batch: 
org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
	at org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
	at org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
	at org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
	at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
	at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:247)
	at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:237)
	at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:146)
	at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
	at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
	at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
	at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:338)
	at org.hibernate.ejb.AbstractEntityManagerImpl$1.beforeCompletion(AbstractEntityManagerImpl.java:515)
	at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:114)
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:247)
	at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:86)
	at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:177)
	at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1382)
	at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:135)
	at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:87)
	at org.jboss.aspects.tx.TxPolicy.endTransaction(TxPolicy.java:175)
	at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:87)
	at org.jboss.aspects.tx.TxInterceptor$Required.invoke(TxInterceptor.java:191)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.aspects.security.AuthenticationInterceptor.invoke(AuthenticationInterceptor.java:77)
	at org.jboss.ejb3.security.Ejb3AuthenticationInterceptor.invoke(Ejb3AuthenticationInterceptor.java:106)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:46)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.asynchronous.AsynchronousInterceptor.invoke(AsynchronousInterceptor.java:106)
	at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
	at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:214)
	at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:184)
	at org.jboss.ejb3.stateless.StatelessLocalProxy.invoke(StatelessLocalProxy.java:81)
	at $Proxy290.addResourceError(Unknown Source)
	at org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl.setResourceError(DiscoveryServerServiceImpl.java:148)
	at sun.reflect.GeneratedMethodAccessor228.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.rhq.enterprise.communications.command.impl.remotepojo.server.RemotePojoInvocationCommandService.execute(RemotePojoInvocationCommandService.java:184)
	at sun.reflect.GeneratedMethodAccessor220.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
	at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
	at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
	at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
	at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
	at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288)
	at $Proxy412.execute(Unknown Source)
	at org.rhq.enterprise.communications.command.server.CommandProcessor.handleIncomingInvocationRequest(CommandProcessor.java:290)
	at org.rhq.enterprise.communications.command.server.CommandProcessor.invoke(CommandProcessor.java:184)
	at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:734)
	at org.jboss.remoting.transport.servlet.ServletServerInvoker.processRequest(ServletServerInvoker.java:231)
	at sun.reflect.GeneratedMethodAccessor216.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
	at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
	at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
	at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
	at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
	at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288)
	at $Proxy372.processRequest(Unknown Source)
	at org.jboss.remoting.transport.servlet.web.ServerInvokerServlet.processRequest(ServerInvokerServlet.java:128)
	at org.jboss.remoting.transport.servlet.web.ServerInvokerServlet.doPost(ServerInvokerServlet.java:157)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.rhq.helpers.rtfilter.filter.RtFilter.doFilter(RtFilter.java:124)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
	at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:179)
	at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
	at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:241)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:580)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
	
Comment 1 John Mazzitelli 2009-05-04 22:26:57 EDT
seeing this in cmd trace:

2009-05-04 22:25:24,318 TRACE {send.complete}=>>DiscoveryServerService.setResourceError|{rhq.agent-name=mazzthink, rhq.retry=1, rhq.security-token=1241245112171-1980743266-8190587527248950016, rhq.send-throttle=true, rhq.guaranteed-delivery=true}|failed:java.lang.reflect.InvocationTargetException:null -> org.rhq.core.util.exception.WrappedRemotingException:javax.persistence.PersistenceException: org.hibernate.PropertyValueException: not-null property references a null or transient value: org.rhq.core.domain.resource.ResourceError.summary -> org.rhq.core.util.exception.WrappedRemotingException:org.hibernate.PropertyValueException: not-null property references a null or transient value: org.rhq.core.domain.resource.ResourceError.summary -> org.rhq.core.util.exception.WrappedRemotingException:not-null property references a null or transient value: org.rhq.core.domain.resource.ResourceError.summary

I have it where the agent goes into an infinite loop constantly sending avai reports... I don't know why yet.
Comment 2 John Mazzitelli 2009-05-04 23:15:59 EDT
This sendResourceErrorToServer is getting called every 1 second or so. I do have two Embedded Tomcat connector resources that are failing their avialability checks. but why is this constantly calling setRersourceError - the resource errors are getting stored in the DB (I checked) and the agent isn't getting exceptions from server.

Daemon Thread [InventoryManager.availability-1] (Suspended (breakpoint at line 1192 in org.rhq.core.pc.inventory.InventoryManager))	
	owns: java.lang.Object  (id=7013)	
	org.rhq.core.pc.inventory.InventoryManager.sendResourceErrorToServer(org.rhq.core.domain.resource.ResourceError) line: 1192	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 178	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 210	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 210	
	org.rhq.core.pc.inventory.AvailabilityExecutor.checkInventory(org.rhq.core.domain.resource.Resource, org.rhq.core.domain.discovery.AvailabilityReport, boolean, boolean) line: 210	
	org.rhq.core.pc.inventory.AvailabilityExecutor.call() line: 106	
	org.rhq.core.pc.inventory.AvailabilityExecutor.run() line: 70	
	java.util.concurrent.Executors$RunnableAdapter<T>.call() line: 441	
	java.util.concurrent.FutureTask$Sync.innerRun() line: 303	
	java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() line: 138	
	java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) line: 98	
	java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 207	
	java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) line: 885	
	java.util.concurrent.ThreadPoolExecutor$Worker.run() line: 907	
	java.lang.Thread.run() line: 619	
Comment 3 John Mazzitelli 2009-05-05 00:59:59 EDT
I think the root cause is this:

id	uuid	name	resource_key	parent_resource_id
500639	92f9a2a2-df5a-4673-a251-aac93f89a00c	Tomcat Connector  (127.0.0.1:8009)	jboss.web:address=127.0.0.1,port=8009,type=Connector	501246

500640	92f9a2a2-df5a-4673-a251-aac93f89a00c	Tomcat Connector  (127.0.0.1:8009)	jboss.web:address=127.0.0.1,port=8009,type=Connector	501246

500641	b71e8fb5-c050-4ce5-be4d-a694a5c32c06	Tomcat Connector  (127.0.0.1:8080)	jboss.web:address=127.0.0.1,port=8080,type=Connector	501246

500642	b71e8fb5-c050-4ce5-be4d-a694a5c32c06	Tomcat Connector  (127.0.0.1:8080)	jboss.web:address=127.0.0.1,port=8080,type=Connector	501246

resource IDs 39,40 are dups - as is 41,42. Some how, we inventoried the same resources twice.

Then what happens is in InventoryManager.synchInventory:

            if (!syncedResources.isEmpty() || !unknownResourceIds.isEmpty() || !modifiedResourceIds.isEmpty()) {
                performAvailabilityChecks(true);
                this.inventoryThreadPoolExecutor.schedule((Callable<? extends Object>) this.serviceScanExecutor, 5,
                    TimeUnit.SECONDS);
            }

the modifiedResourceIds is never empty - first it has two elements of 500639, 500641 - then it comes back within a second and it has 500640,500642 and then it comes back and repeats. It always thinks its needs to perform new avail checks and service scans - which puts us in an infinite loop (we scan, avail check, then scan avail check, scan....and so on)
Comment 4 John Mazzitelli 2009-05-05 01:29:49 EDT
i think we need a unique constraint on the uuid column in rhq_resource - at the very least it would have caught this and not allowed the data to get into the DB (and its those dup rows that caused the agent to wig out). 

I manually deleted the 2 dup rows from the database and the agent is acting fine now... no infinite loop
Comment 5 John Mazzitelli 2009-05-05 09:02:09 EDT
I did the ?debug=1 trick on the resource hub so I can uninventory those dup services. Now, every 15 minutes, I'm seeing this in teh server log:

08:07:11,537 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:07:11,803 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:07:11 EDT 2009
08:22:16,943 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:22:17,115 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:22:17 EDT 2009
08:37:22,162 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:37:22,287 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:37:22 EDT 2009
08:52:28,646 INFO  [CoreServerServiceImpl] Agent [mazzthink][1.3.0-SNAPSHOT(3877)] would like to connect to this server
08:52:29,068 INFO  [CoreServerServiceImpl] Agent [mazzthink] has connected to this server at Tue May 05 08:52:29 EDT 2009
Comment 6 John Mazzitelli 2009-05-05 09:56:51 EDT
that last comment may be a red herring - currently debugging this and this is getting caused by the failover code - my box recently lost its VPN and then I reconnected to the VPN, I have a feeling this is causing the problem. I see CannotConnectExceptions when trying to send up inventory and meas. reports, probably due to DNS and IP changes when going on and off VPN
Comment 7 John Mazzitelli 2009-05-07 00:45:45 EDT
running for a while again, and I saw my avails running away again. so I checked to see if I have dups:

id	name	agent_id	parent_resource_id	uuid	resource_key
502550	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	c9c201a3-ea1a-4827-9cdf-846255157f28	C:\mazz\source\rhq\trunk\modules\enterprise\server\container\target\rhq-server-1.3.0-SNAPSHOT\jbossas\server\default
502551	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	c9c201a3-ea1a-4827-9cdf-846255157f28	C:\mazz\source\rhq\trunk\modules\enterprise\server\container\target\rhq-server-1.3.0-SNAPSHOT\jbossas\server\default
502552	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	c9c201a3-ea1a-4827-9cdf-846255157f28	C:\mazz\source\rhq\trunk\modules\enterprise\server\container\target\rhq-server-1.3.0-SNAPSHOT\jbossas\server\default

502553	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	95b5bc13-dbf0-46c1-b610-f86b02719058	C:\mazz\source\jopr\trunk\modules\dist\server\target\jopr-server-2.3.0-SNAPSHOT\jbossas\server\default
502554	mazzthink RHQ Server,  JBossAS 4.2.1.GA default (0.0.0.0:2099)	500550	500550	95b5bc13-dbf0-46c1-b610-f86b02719058	C:\mazz\source\jopr\trunk\modules\dist\server\target\jopr-server-2.3.0-SNAPSHOT\jbossas\server\default

now, I know for sure that I had network outages (my comcast went down several times today - not long, but 1 minute here, 10 minutes there, about 3 times or so today). This means that I lost my VPN. I'm not sure how that would cause this problem of duplicate resources, but the two times this has happened to me, I did lose my VPN connection.
Comment 8 John Mazzitelli 2009-05-11 09:48:28 EDT
This is the SQL you can use to find out if you have any duplicate resources:

select * from rhq_resource a, rhq_resource b where a.id != b.id and a.uuid = b.uuid
Comment 9 John Mazzitelli 2009-05-20 16:33:35 EDT
I'm getting this again, but I do not have dup resources.

I compeltely deleted my /data directory, started with -l and I still get the problem.
Comment 10 John Mazzitelli 2009-05-20 16:51:44 EDT
BTW: I saw this happen when I shutdown the agent, deleted the agent install dir, and re-installed a new agent but kept the agent config prefs intact.
Comment 11 John Mazzitelli 2009-05-20 17:16:47 EDT
does:

                    boolean ok = configuration.getServerServices().getDiscoveryServerService().mergeAvailabilityReport(
                        report);

turn around and ask for another avail report to be sent? I hit this in a debugger, tried to step over while the VM was paused and it never returned. Once I restarted all the VM threads, everthing kept going but it kept looping
Comment 12 John Mazzitelli 2009-05-21 02:10:01 EDT
put a breakpoint in InventoryManager.syncInventory - it keeps getting called.

I execute this while in this method:

System.out.print(this.resourceContainers.get("783cb684-ce28-4316-a657-ed32cc432a40"));
System.out.print(this.resourceContainers.get("582ff136-024d-4067-8f9e-032ff78cbc06"));

and look what comes out - notice the UUIDs are different but the IDs are the same!!!  Two resources, same ID in the agent inventory:

ResourceContainer: resource=[Resource[id=500067, type=Java Service Wrapper Launcher, key=jsw, name=RHQ Agent Java Service Wrapper, parent=mazzthink RHQ Agent, version=1.3.0-SNAPSHOT]]

ResourceContainer: resource=[Resource[id=500067, type=Datasource, key=jboss.jca:name=RHQDS,service=DataSourceBinding, name=RHQDS Datasource, parent=mazzthink RHQ Server,  JBossAS 4.2.3.GA default (0.0.0.0:2099)]]
Comment 13 John Mazzitelli 2009-05-21 02:12:24 EDT
inventory -x shows this:

            <resource>
               <id>500067</id>
               <key>jboss.jca:name=RHQDS,service=DataSourceBinding</key>
               <name>RHQDS Datasource</name>
               <version></version>
               <uuid>582ff136-024d-4067-8f9e-032ff78cbc06</uuid>

but database shows this:

500067	783cb684-ce28-4316-a657-ed32cc432a40	RHQ Agent Java Service Wrapper

somehow the agent sync'ed incorrectly - the DB says 500067 is the JSW, but the agent things that ID is for the datasource.

this is the root cause of the avail looping because syncInventory goes batty and keeps getting called because the agent things it isn't fully sync'ed
Comment 14 John Mazzitelli 2009-05-21 09:30:53 EDT
i was using an old inventory.dat to try to get teh agent to be out of sync, but I think that was the problem. I had a bogus test. this may not be a problem.

i'm leaving this jira open just in case i see this under different circumstances.
Comment 15 John Mazzitelli 2009-05-21 09:36:48 EDT
the problem with the duplicate resources still needs to be fixed.
Comment 16 Corey Welton 2009-08-05 14:03:48 EDT
Push to  1.4
Comment 17 Charles Crouch 2009-09-11 10:37:46 EDT
I'm seeing this again in the GSS environment (a QA build RHQ r5013)

..
2009-09-11 09:15:55,144 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerS
erviceImpl] Processed AV:[vm138.gsslab.rdu.redhat.com][348][full] - need full=[fals
e] in (156)ms
2009-09-11 09:15:56,786 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerS
erviceImpl] Processed AV:[vm138.gsslab.rdu.redhat.com][348][full] - need full=[fals
e] in (147)ms
...

I started seeing this after I restarted the agent with -u (previously I was seeing my JBAS instance show red when it was infact up)
I checked for dupe resources using the SQL and saw none.

Restarting the agent with --clean seems to have resolved the multiple avail reports.

------------------

After some more thought I think this can probably be explained by some nasty clock issues.
The fast avail reporting seem to start around mindnight last night. Yesterday afternoon the clocks were reset using ntpd underneath the feet of the agent and server, from around midnight back to 4pm:

(3:46:00 PM) ccrouch: [gss@vm138 bin]$ date
(3:46:00 PM) ccrouch: Fri Sep 11 00:30:33 EDT 2009
(3:46:00 PM) ccrouch: [gss@vm138 bin]$ date
(3:46:00 PM) ccrouch: Thu Sep 10 16:46:02 EDT 2009

I bet when we came across old date with a timestamp in the future we got confused :-/
Comment 18 Red Hat Bugzilla 2009-11-10 15:56:52 EST
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2057
This bug relates to RHQ-1885
Comment 19 Corey Welton 2010-08-30 13:33:30 EDT
 mazz - what's left here?  can we revisit unique restraint solution?
Comment 20 John Mazzitelli 2010-08-30 14:02:13 EDT
we've done lots of changes to inventory sync since this was created. I would close this as this problem hasn't been seen in a while.

That said, this might still be something we should consider:

"i think we need a unique constraint on the uuid column in rhq_resource - at the
very least it would have caught this and not allowed the data to get into the
DB (and its those dup rows that caused the agent to wig out). "

The UUIDs should be unique across the entire inventory.

This would require a change to dbsetup/dbupgrade scripts (or liquibase if we ever move to that)
Comment 21 John Mazzitelli 2011-03-15 00:58:39 EDT
FYI: the hibernate errors during calls to setResourceError are being fixed via bug 664126
Comment 22 Jay Shaughnessy 2012-02-28 14:47:26 EST
I'm closing this since it's been stale for a while and we have massive
amounts of refactored/change code in the area of question.

Note You need to log in before you can comment on or make changes to this bug.