Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 618879 - Issues with AS Queue sync causes bad inventory.dat and failed avail processing
Issues with AS Queue sync causes bad inventory.dat and failed avail processing
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Core Server (Show other bugs)
3.0.0
All All
urgent Severity urgent (vote)
: ---
: ---
Assigned To: Ian Springer
Corey Welton
:
Depends On:
Blocks: jon-sprint12-bugs
  Show dependency treegraph
 
Reported: 2010-07-27 19:10 EDT by Jay Shaughnessy
Modified: 2013-08-05 20:37 EDT (History)
4 users (show)

See Also:
Fixed In Version: 2.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-08-12 12:50:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch for avail id==0 Server-side exception half of this issue (14.99 KB, patch)
2010-07-27 23:21 EDT, Ian Springer
no flags Details | Diff
patch of the patch to avoid ID=0 resources and not get previous if a full report (2.72 KB, patch)
2010-07-28 11:38 EDT, John Mazzitelli
no flags Details | Diff
patch w/ fix so Agent inventory sync can handle Resources that got moved to new parents (10.38 KB, patch)
2010-07-28 13:20 EDT, Ian Springer
no flags Details | Diff
Possible Missing Topics issue (28.18 KB, image/png)
2010-07-28 22:15 EDT, Corey Welton
no flags Details

  None (edit)
Description Jay Shaughnessy 2010-07-27 19:10:40 EDT
There seems to be a problem related to the AS5 plugin's Queue (and possibly Topic) parent type reassignment in the 3.0.0 plugin.

The problem can manifest itself with invalid inventory items on the agent, which can in turn cause availability report processing failures on the server.

It seems reproducible. Steps to Reproduce (using JON). Notevery step may be necessary but I'll include then anyway to most closely resemble my env:

1. Install 2.3.1 Server (EAP plugin pack) and Agent
2. Import an EAP5 instance. Also the RHQ agent and Server and anything else you may want. Keep the EAP running throughout this procedure
3. On the agent execute > inventory --export=inv-23.dat
4. Shut down 2.3.1 Server and Agent
5. Upgrade server to 2.4.0 (EAP plugin pack)
6. Install 2.4.0 Agent and start with > rhq-agent and no params. It should pick up the previous config. The inventory will be empty since it's newly installed.
7. On the agent execute > inventory --export=inv-24-1.dat
8. Import any new resources, minimally the new RHQ Server and Agent
9. On the agent execute > inventory --export=inv-24-2.dat
10. Uninventory the old RHQ Server and Agent
11. On the agent execute > inventory --export=inv-24-3.dat
12. Shut down the EAP instance
13. On the agent execute > avail

If the problem is reproduced you should see an error on the server at this point like:

19:01:30,944 INFO  [DiscoveryServerServiceImpl] Error processing availability report from [jshaughn]: javax.ejb.EJBExcep
tion:javax.persistence.PersistenceException: org.hibernate.PropertyValueException: not-null property references a null o
r transient value: org.rhq.core.domain.measurement.Availability.resource

This would be due to the fact that the inventory report contains resources with 0 ids.  These bad resources can be found in the .dat file collected.  Here you'd weirdness like:

Resource[id=10040, type=Queue, key=/queue/ExpiryQueue, name=ExpiryQueue, parent=jshaughnessy-PC JBoss EAP 5.0.0.GA default (0.0.0.0:1099)] (sync=SYNCHRONIZED, state=STARTED

Note the Parent. The parent is the EAP server. That was supposed to change during plugin registration see the startup log during plugin registration:

(5:38:01 PM) jshaughn: 17:16:31,018 INFO  [ResourceMetadataManagerBean] Adding ResourceType [JBossAS5:Queue(id=10072)] as child of ResourceType
(5:38:01 PM) jshaughn:  [JBossAS5:JBoss Messaging(id=10169)]...
(5:38:01 PM) jshaughn: 17:16:31,018 INFO  [ResourceMetadataManagerBean] Removing type [JBossAS5:Queue(id=10072)] from parent type [JBossAS5:JBo
(5:38:01 PM) jshaughn: ssAS Server(id=10050)]...
(5:39:12 PM) jshaughn: This says it added RT Queue as child of JBoss Messaging and removed it as child of JBossAS Server

Then later, I end up with the following added to my inventory report:

Resource[id=0, type=Queue, key=/queue/ExpiryQueue, name=ExpiryQueue, parent=JBoss Messaging] (sync=NEW, state=STOPPED, avail=UNKNOWN

If you see entries like this (search for id=0) in your inventory report then you've got the problem.

An Agent restart --clean will eliminate the issue.  I'm not sure if it goes away permanently.
Comment 1 Charles Crouch 2010-07-27 19:21:58 EDT
Setting to urgent for investigation
Comment 2 Ian Springer 2010-07-27 23:21:18 EDT
Created attachment 434896 [details]
patch for avail id==0 Server-side exception half of this issue
Comment 3 John Mazzitelli 2010-07-28 11:38:42 EDT
Created attachment 435050 [details]
patch of the patch to avoid ID=0 resources and not get previous if a full report
Comment 4 Ian Springer 2010-07-28 13:20:07 EDT
Created attachment 435084 [details]
patch w/ fix so Agent inventory sync can handle Resources that got moved to new parents
Comment 5 Ian Springer 2010-07-28 14:02:26 EDT
My patch for the Resource id==0 Server-side exception during avail report processing, along with Mazz's tweaks to it, has been committed and pushed to release-3.0.0. when I return to lunch, I will add a set of test cases to verify there have been no regressions in avail reporting or report processing, as a result of this change.
Comment 6 Corey Welton 2010-07-28 21:47:17 EDT
As for the test case in the initial comment, I think we are fine there.  I get  errors in neither the agent nor server logs.  

Snippets from the time period in which the eap was shutdown.

* server log
pe=RHQ Agent, key=core-02.usersys.redhat.com RHQ Agent, name=core-02.usersys.redhat.com RHQ Agent, parent=core-02.usersys.redhat.com]] for asynchronous uninventory
2010-07-28 21:29:59,041 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2010-07-28 21:32:14,303 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2010-07-28 21:32:28,199 WARN  [org.hibernate.hql.ast.QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory!
2010-07-28 21:33:13,505 INFO  [org.rhq.enterprise.server.scheduler.jobs.AsyncResourceDeleteJob] Async resource deletion - 237 successful, 0 failed, took [18434] ms
2010-07-28 21:37:48,878 INFO  [org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed AV:[core-02.usersys.redhat.com][886][full] - need full=[false] in (426)ms

* agent log
2010-07-28 21:31:58,079 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.InventoryManager)- Sending availability report to Server...
2010-07-28 21:32:07,574 INFO  [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [144] metrics took 2774ms - sending report to Server...
2010-07-28 21:32:37,575 INFO  [MeasurementManager.sender-2] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [1932] metrics took 4163ms - sending report to Server...
2010-07-28 21:33:07,572 INFO  [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection for [203] metrics took 683ms - sending report to Server...

Furthermore, I am not seeing the items in the avail report to the server

We seem to be good here.  Awaiting additional test cases from dev before signing off.
Comment 7 Corey Welton 2010-07-28 21:59:30 EDT
Other things to test.

[21:46] <@ips> also doublecheck that the topic and queue resources are green and collecting metrics
[21:46] <@ips> and that operations, config, etc. are all working

Dev will be adding additional test cases  "for testing general availablity reporting functionality"
Comment 8 Corey Welton 2010-07-28 22:02:10 EDT
(The above means, of course, that you will have to restart your EAP server that you took down in step 12 of the initial testcase)
Comment 9 Corey Welton 2010-07-28 22:15:17 EDT
Created attachment 435173 [details]
Possible Missing Topics issue

I think somewhere along the line I lost my Topics.  I suppose there's an off-chance that it happened after the upgrade (someone should check that out) but the possibility exists that it occurred due to this fix.
Comment 10 Corey Welton 2010-07-28 22:46:58 EDT
Sigh, possible false alarm in comment #9 -- reinstalled a 2.3.1 and inventoried, to find out that this EAP 5.0.1 server does not appear to have had any Topics in the first place...?
Comment 11 Ian Springer 2010-07-28 23:06:13 EDT
Yep, that's normal. Out of box, EAP 5.x has two queues deployed (DLQ and ExpiryQueue) and no topics.
Comment 12 Ian Springer 2010-07-29 00:04:16 EDT
TEST STEPS
==========
1) install JON 2.3.1
2) install an EAP 5.1 instance
3) get some test JBoss Messaging topics and queues deployed to the EAP instance as follows:

  cd $JBOSS_EAP_HOME
  cp docs/examples/jms/example-destinations-service.xml server/$CONFIG_NAME/deploy/messaging/destinations-service.xml 

4) start up the EAP instance
5) discover the EAP instance and import it into JON inv
6) wait a few minutes and then verify the test topics and queues have been discovered and are listed as children of the EAP server resource and are green
7) upgrade to JON 2.4 (Server and Agents)
8) wait a few minutes then verify in the GUI that the topics and queues are now children of a newly added JBoss Messaging singleton resource, rather than being direct children of the EAP server resource as they were before the upgrade. verify the topics and queues are all still green and that the JBoss Messaging resource is also green
9) stop the EAP instance (no need to stop it via JON - just Ctrl-C it); wait for it to fully shutdown
10) run the 'avail' command from the prompt of the Agent on the same box as the EAP instance
11) Verify that you do *not* see any errors in the Server log like the following shortly after running the 'avail' command:

19:01:30,944 INFO  [DiscoveryServerServiceImpl] Error processing availability
report from [jshaughn]: javax.ejb.EJBExcep
tion:javax.persistence.PersistenceException:
org.hibernate.PropertyValueException: not-null property references a null o
r transient value: org.rhq.core.domain.measurement.Availability.resource

12) Refresh the GUI, then verify that the EAP server resource and all of its descendant resources turn red in the GUI.
13) Restart the EAP instance and wait for it to fully start up.
14) Run the 'avail' prompt command again.
15) Refresh the GUI, then verify that the EAP server resource and all of its descendant resources turn green again in the GUI.
Comment 13 Ian Springer 2010-07-29 00:32:11 EDT
MORE TEST STEPS
===============
16) stop the EAP instance (no need to stop it via JON - just Ctrl-C it); wait
for it to fully shutdown
17) run the 'avail --changed' command from the prompt of the Agent on the same box as the EAP instance
18) Verify that you do not see any errors related to availability processing in the Agent or Server log
19) Refresh the GUI, then verify that the EAP server resource and all of its
descendant resources turn red in the GUI.
20) Restart the EAP instance and wait for it to fully start up.
21) Run the 'avail --changed' prompt command again.
22) Refresh the GUI, then verify that the EAP server resource and all of its
descendant resources turn green again in the GUI.    
23) Use the jmx-console and find the MBean for one of the sample queues (e.g. jboss.messaging.destination:name=A,service=Queue) and invoke its Stop operation.
24) Wait a couple minutes, then run the 'avail' prompt command
25) Refresh the GUI, then verify that the Queue resource corresponding to the queue you just stopped has turned red in the GUI.
26) Use the jmx-console and find the MBean for the sample queue you just stopped and invoke its Start operation.
27) Wait a couple minutes, then run the 'avail' prompt command
28) Refresh the GUI, then verify that the Queue resource corresponding to the queue you just started has turned green in the GUI.
29) Run any other testing you can think of to ensure there have been no regressions in availability reporting.
Comment 14 Filip Drabek 2010-07-29 04:02:04 EDT
I have tested all steps and I have not found any errors.
Comment 15 Rajan Timaniya 2010-07-29 06:55:19 EDT
Tested as per comment-7/12/13 with JON 2.4 GA (build #93). Test result looks
good to me.

There isn't error -
javax.ejb.EJBException:javax.persistence.PersistenceException:
org.hibernate.PropertyValueException: not-null property references a null or
transient value: org.rhq.core.domain.measurement.Availability.resource


** server log
2010-07-29 16:12:57,687 INFO 
[org.rhq.enterprise.server.core.plugin.AgentPluginScanner] Filesystem has a
plugin [JBossESB] at the file
[/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-soa-plugin-SOA.4.3.0.GA_CP02.jar]
which is different than where the DB thinks it should be
[/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-plugin-SOA.4.3.0.GA_CP02.jar]
2010-07-29 16:12:57,687 INFO 
[org.rhq.enterprise.server.core.plugin.AgentPluginScanner] Filesystem has a
plugin [JBossESB5] at the file
[/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-soa5-plugin-5.0.0.GA.jar]
which is different than where the DB thinks it should be
[/NotBackedUp/install/jon2.4_93/jon-server-2.4.0.GA/jbossas/server/default/deploy/rhq.ear/rhq-downloads/rhq-plugins/rhq-jbossesb-plugin-2.3.2-as5.jar]
2010-07-29 16:12:59,844 INFO 
[org.rhq.enterprise.server.discovery.DiscoveryServerServiceImpl] Processed
AV:[10.65.193.1][763][full] - need full=[false] in (557)ms


** agent log
2010-07-29 16:13:24,469 WARN  [InventoryManager.discovery-1]
(rhq.core.pc.inventory.InventoryManager)- Cannot start component for
Resource[id=10005, type=Apache HTTP Server, key=/etc/httpd, name=rajanlaptop
Apache 2.2.3 (/etc/httpd/), parent=10.65.193.1, version=2.2.3] from
synchronized merge due to invalid plugin config: Failed to start component for
resource Resource[id=10005, type=Apache HTTP Server, key=/etc/httpd,
name=rajanlaptop Apache 2.2.3 (/etc/httpd/), parent=10.65.193.1,
version=2.2.3].
2010-07-29 16:13:24,493 INFO  [InventoryManager.discovery-1]
(rhq.core.pc.inventory.AutoDiscoveryExecutor)- Found 0 servers.
2010-07-29 16:13:35,212 INFO  [ResourceContainer.invoker.daemon-16]
(org.rhq.plugins.apache.ApacheServerComponent)- Initializing server component
for server [/etc/httpd]...
2010-07-29 16:13:35,214 INFO  [ResourceContainer.invoker.daemon-16]
(rhq.plugins.www.snmp.SNMPClient)- Initialized SNMP session for agent at
/127.0.0.1:1610
2010-07-29 16:13:35,318 WARN  [ResourceContainer.invoker.daemon-16]
(org.rhq.plugins.apache.ApacheServerComponent)- Failed to connect to SNMP agent
at 127.0.0.1/1610/public
. Make sure
1) the managed Apache server has been instrumented with the JON SNMP module,
2) the Apache server is running, and
3) the SNMP agent host, port, and community are set correctly in this
resource's connection properties.
The agent will not be able to record metrics from apache httpd without SNMP
2010-07-29 16:13:38,422 INFO  [InventoryManager.availability-1]
(rhq.core.pc.inventory.InventoryManager)- Sending availability report to
Server...
2010-07-29 16:13:51,277 INFO  [MeasurementManager.sender-1]
(rhq.core.pc.measurement.MeasurementCollectorRunner)- Measurement collection
for [11] metrics took 1411ms - sending report to Server...
Comment 16 Corey Welton 2010-08-12 12:50:42 EDT
Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.