There seems to be a problem related to the AS5 plugin's Queue (and possibly Topic) parent type reassignment in the 3.0.0 plugin. The problem can manifest itself with invalid inventory items on the agent, which can in turn cause availability report processing failures on the server. It seems reproducible. Steps to Reproduce (using JON). Notevery step may be necessary but I'll include then anyway to most closely resemble my env: 1. Install 2.3.1 Server (EAP plugin pack) and Agent 2. Import an EAP5 instance. Also the RHQ agent and Server and anything else you may want. Keep the EAP running throughout this procedure 3. On the agent execute > inventory --export=inv-23.dat 4. Shut down 2.3.1 Server and Agent 5. Upgrade server to 2.4.0 (EAP plugin pack) 6. Install 2.4.0 Agent and start with > rhq-agent and no params. It should pick up the previous config. The inventory will be empty since it's newly installed. 7. On the agent execute > inventory --export=inv-24-1.dat 8. Import any new resources, minimally the new RHQ Server and Agent 9. On the agent execute > inventory --export=inv-24-2.dat 10. Uninventory the old RHQ Server and Agent 11. On the agent execute > inventory --export=inv-24-3.dat 12. Shut down the EAP instance 13. On the agent execute > avail If the problem is reproduced you should see an error on the server at this point like: 19:01:30,944 INFO [DiscoveryServerServiceImpl] Error processing availability report from [jshaughn]: javax.ejb.EJBExcep tion:javax.persistence.PersistenceException: org.hibernate.PropertyValueException: not-null property references a null o r transient value: org.rhq.core.domain.measurement.Availability.resource This would be due to the fact that the inventory report contains resources with 0 ids. These bad resources can be found in the .dat file collected. Here you'd weirdness like: Resource[id=10040, type=Queue, key=/queue/ExpiryQueue, name=ExpiryQueue, parent=jshaughnessy-PC JBoss EAP 5.0.0.GA default (0.0.0.0:1099)] (sync=SYNCHRONIZED, state=STARTED Note the Parent. The parent is the EAP server. That was supposed to change during plugin registration see the startup log during plugin registration: (5:38:01 PM) jshaughn: 17:16:31,018 INFO [ResourceMetadataManagerBean] Adding ResourceType [JBossAS5:Queue(id=10072)] as child of ResourceType (5:38:01 PM) jshaughn: [JBossAS5:JBoss Messaging(id=10169)]... (5:38:01 PM) jshaughn: 17:16:31,018 INFO [ResourceMetadataManagerBean] Removing type [JBossAS5:Queue(id=10072)] from parent type [JBossAS5:JBo (5:38:01 PM) jshaughn: ssAS Server(id=10050)]... (5:39:12 PM) jshaughn: This says it added RT Queue as child of JBoss Messaging and removed it as child of JBossAS Server Then later, I end up with the following added to my inventory report: Resource[id=0, type=Queue, key=/queue/ExpiryQueue, name=ExpiryQueue, parent=JBoss Messaging] (sync=NEW, state=STOPPED, avail=UNKNOWN If you see entries like this (search for id=0) in your inventory report then you've got the problem. An Agent restart --clean will eliminate the issue. I'm not sure if it goes away permanently.
Setting to urgent for investigation
Created attachment 434896 [details] patch for avail id==0 Server-side exception half of this issue
Created attachment 435050 [details] patch of the patch to avoid ID=0 resources and not get previous if a full report
Created attachment 435084 [details] patch w/ fix so Agent inventory sync can handle Resources that got moved to new parents
My patch for the Resource id==0 Server-side exception during avail report processing, along with Mazz's tweaks to it, has been committed and pushed to release-3.0.0. when I return to lunch, I will add a set of test cases to verify there have been no regressions in avail reporting or report processing, as a result of this change.
As for the test case in the initial comment, I think we are fine there. I get errors in neither the agent nor server logs. Snippets from the time period in which the eap was shutdown. * server log pe=RHQ Agent, key=core-02.usersys.redhat.com RHQ Agent, name=core-02.usersys.redhat.com RHQ Agent, parent=core-02.usersys.redhat.com]] for asynchronous uninventory 2010-07-28 21:29:59,041 WARN [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory! 2010-07-28 21:32:14,303 WARN [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory! 2010-07-28 21:32:28,199 WARN [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory! 2010-07-28 21:33:13,505 INFO [org.rhq.enterprise.server.scheduler.jobs.AsyncResourceDeleteJob] Async resource deletion - 237 successful, 0 failed, took [18434] ms 2010-07-28 21:37:48,878 INFO [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed AV:[core-02.usersys.redhat.com][886][full] - need full=[false] in (426)ms * agent log 2010-07-28 21:31:58,079 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server... 2010-07-28 21:32:07,574 INFO [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [144] metrics took 2774ms - sending report to Server... 2010-07-28 21:32:37,575 INFO [MeasurementManager.sender-2] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [1932] metrics took 4163ms - sending report to Server... 2010-07-28 21:33:07,572 INFO [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [203] metrics took 683ms - sending report to Server... Furthermore, I am not seeing the items in the avail report to the server We seem to be good here. Awaiting additional test cases from dev before signing off.
Other things to test. [21:46] <@ips> also doublecheck that the topic and queue resources are green and collecting metrics [21:46] <@ips> and that operations, config, etc. are all working Dev will be adding additional test cases "for testing general availablity reporting functionality"
(The above means, of course, that you will have to restart your EAP server that you took down in step 12 of the initial testcase)
Created attachment 435173 [details] Possible Missing Topics issue I think somewhere along the line I lost my Topics. I suppose there's an off-chance that it happened after the upgrade (someone should check that out) but the possibility exists that it occurred due to this fix.
Sigh, possible false alarm in comment #9 -- reinstalled a 2.3.1 and inventoried, to find out that this EAP 5.0.1 server does not appear to have had any Topics in the first place...?
Yep, that's normal. Out of box, EAP 5.x has two queues deployed (DLQ and ExpiryQueue) and no topics.
TEST STEPS ========== 1) install JON 2.3.1 2) install an EAP 5.1 instance 3) get some test JBoss Messaging topics and queues deployed to the EAP instance as follows: cd $JBOSS_EAP_HOME cp docs/examples/jms/example-destinations-service.xml server/$CONFIG_NAME/deploy/messaging/destinations-service.xml 4) start up the EAP instance 5) discover the EAP instance and import it into JON inv 6) wait a few minutes and then verify the test topics and queues have been discovered and are listed as children of the EAP server resource and are green 7) upgrade to JON 2.4 (Server and Agents) 8) wait a few minutes then verify in the GUI that the topics and queues are now children of a newly added JBoss Messaging singleton resource, rather than being direct children of the EAP server resource as they were before the upgrade. verify the topics and queues are all still green and that the JBoss Messaging resource is also green 9) stop the EAP instance (no need to stop it via JON - just Ctrl-C it); wait for it to fully shutdown 10) run the 'avail' command from the prompt of the Agent on the same box as the EAP instance 11) Verify that you do *not* see any errors in the Server log like the following shortly after running the 'avail' command: 19:01:30,944 INFO [DiscoveryServerServiceImpl] Error processing availability report from [jshaughn]: javax.ejb.EJBExcep tion:javax.persistence.PersistenceException: org.hibernate.PropertyValueException: not-null property references a null o r transient value: org.rhq.core.domain.measurement.Availability.resource 12) Refresh the GUI, then verify that the EAP server resource and all of its descendant resources turn red in the GUI. 13) Restart the EAP instance and wait for it to fully start up. 14) Run the 'avail' prompt command again. 15) Refresh the GUI, then verify that the EAP server resource and all of its descendant resources turn green again in the GUI.
MORE TEST STEPS =============== 16) stop the EAP instance (no need to stop it via JON - just Ctrl-C it); wait for it to fully shutdown 17) run the 'avail --changed' command from the prompt of the Agent on the same box as the EAP instance 18) Verify that you do not see any errors related to availability processing in the Agent or Server log 19) Refresh the GUI, then verify that the EAP server resource and all of its descendant resources turn red in the GUI. 20) Restart the EAP instance and wait for it to fully start up. 21) Run the 'avail --changed' prompt command again. 22) Refresh the GUI, then verify that the EAP server resource and all of its descendant resources turn green again in the GUI. 23) Use the jmx-console and find the MBean for one of the sample queues (e.g. jboss.messaging.destination:name=A,service=Queue) and invoke its Stop operation. 24) Wait a couple minutes, then run the 'avail' prompt command 25) Refresh the GUI, then verify that the Queue resource corresponding to the queue you just stopped has turned red in the GUI. 26) Use the jmx-console and find the MBean for the sample queue you just stopped and invoke its Start operation. 27) Wait a couple minutes, then run the 'avail' prompt command 28) Refresh the GUI, then verify that the Queue resource corresponding to the queue you just started has turned green in the GUI. 29) Run any other testing you can think of to ensure there have been no regressions in availability reporting.
I have tested all steps and I have not found any errors.
Tested as per comment-7/12/13 with JON 2.4 GA (build #93). Test result looks good to me. There isn't error - javax.ejb.EJBException:javax.persistence.PersistenceException: org.hibernate.PropertyValueException: not-null property references a null or transient value: org.rhq.core.domain.measurement.Availability.resource ** server log 2010-07-29 16:12:57,687 INFO [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] Filesystem has a plugin [JBossESB] at the file [/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-soa-plugin-SOA.4.3.0.GA_CP02.jar] which is different than where the DB thinks it should be [/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-plugin-SOA.4.3.0.GA_CP02.jar] 2010-07-29 16:12:57,687 INFO [org.rhq.enterprise.server.core.plugin.AgentPluginScanner] Filesystem has a plugin [JBossESB5] at the file [/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-soa5-plugin-5.0.0.GA.jar] which is different than where the DB thinks it should be [/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-plugin-2.3.2-as5.jar] 2010-07-29 16:12:59,844 INFO [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed AV:[10.65.193.1][763][full] - need full=[false] in (557)ms ** agent log 2010-07-29 16:13:24,469 WARN [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Cannot start component for Resource[id=10005, type=Apache HTTP Server, key=/etc/httpd, name=rajanlaptop Apache 2.2.3 (/etc/httpd/), parent=10.65.193.1, version=2.2.3] from synchronized merge due to invalid plugin config: Failed to start component for resource Resource[id=10005, type=Apache HTTP Server, key=/etc/httpd, name=rajanlaptop Apache 2.2.3 (/etc/httpd/), parent=10.65.193.1, version=2.2.3]. 2010-07-29 16:13:24,493 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Found 0 servers. 2010-07-29 16:13:35,212 INFO [ResourceContainer.invoker.daemon-16] (org.rhq.plugins.apache.ApacheServerComponent)- Initializing server component for server [/etc/httpd]... 2010-07-29 16:13:35,214 INFO [ResourceContainer.invoker.daemon-16] (rhq.plugins.www.snmp.SNMPClient)- Initialized SNMP session for agent at /127.0.0.1:1610 2010-07-29 16:13:35,318 WARN [ResourceContainer.invoker.daemon-16] (org.rhq.plugins.apache.ApacheServerComponent)- Failed to connect to SNMP agent at 127.0.0.1/1610/public . Make sure 1) the managed Apache server has been instrumented with the JON SNMP module, 2) the Apache server is running, and 3) the SNMP agent host, port, and community are set correctly in this resource's connection properties. The agent will not be able to record metrics from apache httpd without SNMP 2010-07-29 16:13:38,422 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server... 2010-07-29 16:13:51,277 INFO [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [11] metrics took 1411ms - sending report to Server...
Mass-closure of verified bugs against JON.