850860 – JON Drools knowledge session table has invalid values

Bug 850860 - JON Drools knowledge session table has invalid values

Summary: JON Drools knowledge session table has invalid values

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Plugin -- BRMS 5
Sub Component:
Version:	JON 3.1.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	JON 3.1.2
Assignee:	RHQ Project Maintainer
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-22 14:52 UTC by Pavel Kralik
Modified:	2013-09-11 11:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-09-11 11:03:04 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
Drools application (19.03 KB, application/x-gzip) 2012-08-22 14:52 UTC, Pavel Kralik	no flags	Details
JON log files (245.03 KB, application/x-gzip) 2012-08-22 14:53 UTC, Pavel Kralik	no flags	Details
JON monitoring page (94.43 KB, image/png) 2012-08-22 14:54 UTC, Pavel Kralik	no flags	Details
JConsole monitoring page (91.94 KB, image/png) 2012-08-22 14:54 UTC, Pavel Kralik	no flags	Details
JON 3.1.1.CR1 DEBUG logs for BRMS 5.2.0 (200.06 KB, application/x-gzip) 2012-09-04 11:44 UTC, Pavel Kralik	no flags	Details
JON 3.1.1.CR1 DEBUG logs for BRMS 5.3.0 (223.19 KB, application/x-gzip) 2012-09-04 11:44 UTC, Pavel Kralik	no flags	Details
Drools session 0 and NaN values (90.07 KB, image/png) 2012-09-05 12:11 UTC, Pavel Kralik	no flags	Details
Drools session 0 and NaN values BRMS 5.3.0 (111.38 KB, image/png) 2012-09-05 12:12 UTC, Pavel Kralik	no flags	Details
View All

Description Pavel Kralik 2012-08-22 14:52:42 UTC

Created attachment 606302 [details]
Drools application

When monitoring the Drools example application (BRMS 5.2.0.GA, BRMS 5.3.0.GA) with JON 3.1.1, the session has different values compared with JConsole.

Version-Release number of selected component (if applicable):

BRMS-5.2.0.GA/5.3.0.GA
JON 3.1.1.ER2:
https://brewweb.devel.redhat.com/buildinfo?buildID=228250

How reproducible:

Always

Steps to Reproduce:
1. Install the JON 3.1.1.ER2 server and with the BRMS plugin

2. Run a drools-jon-example application

3. discover the Drools application.

4. Go to the Drools Knowledge > Session-1 > Monitoring tables and compare it with JConsole

Actual results:

JON/JConsole differs.

Expected results:

Values should be the same.

Additional info:

Steps to get Drools application running:

1) Unpack drools_application.tar.gz
2) Download the BRMS deployable zip file:

BRMS 5.3.0.GA:
http://download.devel.redhat.com/released/JBossBRMS/5.3.0/brms-p-5.3.0.GA-deployable.zip

BRMS 5.2.0.GA:
http://download.devel.redhat.com/released/JBossBRMS/5.2.0/brms-p-5.2.0.GA-deployable.zip

3) unzip brms-p-5.*-deployable.zip
4) unzip jboss-brms-engine.zip
5) Create link from binaries to drools-jon-example/drools
6) Run the application:
     ant compile; ant server
     ant client.phase1; ant client.phase2
7) Try to monitor with Jconsole and JON

Comment 1 Pavel Kralik 2012-08-22 14:53:33 UTC

Created attachment 606304 [details]
JON log files

Comment 2 Pavel Kralik 2012-08-22 14:54:11 UTC

Created attachment 606306 [details]
JON monitoring page

Comment 3 Pavel Kralik 2012-08-22 14:54:45 UTC

Created attachment 606307 [details]
JConsole monitoring page

Comment 4 Charles Crouch 2012-08-24 13:59:54 UTC

Hi Pavel
If the Drools team are going to be able to get to the bottom of the issue, I expect they will need DEBUG level logs from the JON server and agent.

Comment 5 Pavel Kralik 2012-09-04 11:44:04 UTC

Created attachment 609647 [details]
JON 3.1.1.CR1 DEBUG logs for BRMS 5.2.0

JON 3.1.1.CR1:
https://brewweb.devel.redhat.com/buildinfo?buildID=231258
did not fix the bug for BRMS 5.2.0/5.3.0.

Adding JON debug logs.

Comment 6 Pavel Kralik 2012-09-04 11:44:33 UTC

Created attachment 609649 [details]
JON 3.1.1.CR1 DEBUG logs for BRMS 5.3.0

Comment 7 Pavel Kralik 2012-09-05 12:11:53 UTC

Created attachment 610014 [details]
Drools session 0 and NaN values

Comment 8 Pavel Kralik 2012-09-05 12:12:34 UTC

Created attachment 610015 [details]
Drools session 0 and NaN values BRMS 5.3.0

Comment 9 Simeon Pinder 2012-09-12 13:37:58 UTC

First we should be clear that the RHQ tables/graphs are not really expected to always display the same data as JConsole. The JConsole shows the current runtime values while RHQ only takes snapshots of data at some interval determined by the customer. If the default collection time for the attributes is 10-20 mins then it's entirely possible that the NaN values are valid data since RHQ has not yet been asked to collect them. A quick test of this would be to select a metric and test with the 'Get Live Value' button to see if there is data but it just hasn't been formally requested for table/graphing.

You should also focus metric collection on the specific value collection to start and not the "Per Minute" collections as those values will take a bit longer to display as well.

With that being said, what were the scheduled metric collection intervals set to for the properties displayed in the graph? By default some of these metrics are not collected for 10-20 mins. The second thing to check would be to make sure that the time range that you're displaying information for is valid. For newly imported resources the default 'Time Range' displayed may not be displaying data for the time ranges that you expect.

Another data point to check in this situation is to make sure that the clock on the agent and server are synchronized. If you import the RHQ-Agent resource into your inventory the default metrics for the agent will help you track that.

With that being said, with a JON 3.0.1 install I am still seeing some disparity in the displayed metric values. They all appear to show up after some time but
not initially. I'll continue to look into the timing disparity issue here but some display is expected here.

Comment 10 Simeon Pinder 2012-09-14 13:09:10 UTC

Hi Pavel, Can you weigh in on the previous comment? It would help to understand what schedule collection intervals were used in your reproduce steps as well and whether the data does show up for you later if enough time has passed. Thanks.

Comment 11 Pavel Kralik 2012-09-14 16:02:40 UTC

Hi simeon I used 1minute collection interval.

Comment 12 Pavel Kralik 2012-09-14 16:04:01 UTC

After the collection interval passed there were still NaN and 0 values.

Comment 13 Simeon Pinder 2012-09-14 16:07:54 UTC

Do the values show up say after 10 or 15 mins?  Trying to figure out if it's an error loading data ever that you're seeing .. or whether it's just a timing only issue.  What happens when you select a specific metric and do getLiveData? Does that work?

Comment 14 Pavel Kralik 2012-09-14 16:11:12 UTC

After 15 minutes the NaN values change to 0 values.

Comment 15 Simeon Pinder 2012-09-18 13:34:52 UTC

Hi Pavel,

I spent quite a bit of time on this but only found a small unexpected delay for metric collection on the very first collection( which is actually attributed to closed BZ's 751231 and 536181). I initially thought this was a sign of a deeper problem but the delay is planned/intentional to ensure that metric collections consistently happens. After accounting for this issue then I no longer see any significant failures/delays collecting metric values for the "Drools Knowledge Session" as described with your test application.

With that being said the customer should expect the following initialization collections delays before seeing valid collection:
i)The Drools application resources are not initially discoverable until the [JMX Server] node has been discovered, imported and valid connection settings entered.
ii)After step i) then some time should be allowed or initiate another discovery scan to occur and discover the child Drools resources.
iii)One should then wait for/initiate an availability scan to occur as well because metric collection will not happen for 'down' resources.
iv)Metric collection intervals need to be set to appropriate values as by default no collection will occur for 10-20 minutes. As currently configured you will need to go in and change all the metric schedules for the specific session to 30 seconds to ensure prompt collection.
v)Expect that the very first collection will take a bit longer than 30s, and account for delay between ui request and actual agent collection to begin.
vi)Expect all initial "per Minute" metrics to take twice as long to show up as RHQ needs at least two collections to occur before reporting valid data.

Additionally, because of how the metrics have been defined in this plugin, it is primarily the "per Minute" metrics that are enabled with default discovery which means that because of the above reasons that at least two collections, factoring in the above delays, is expected before valid data is seen.

Given the above explanation, we should retest this BZ with the following to see if this is still considered a problem:
- set/enable all metrics on the "Drools Knowledge Sessions" to 30s.
- account for the very first collection to take a minute or more, depending upon if the request for collection is immediate or will be honored in the next request.
- expect the "per Minute" metrics to take longer to be collected than the other metrics.

Moving this to ON_QA for re testing.

Comment 16 Mike Foley 2012-11-15 19:52:18 UTC

added len to :cc list.

len ... can someone on your team verify this?  thanks!

Comment 18 Pavel Kralik 2012-12-12 15:09:33 UTC

Verified. JON 3.1.2.ER4 and BRMS 5.2.0.GA/5.3.0.GA/5.3.1.CR1.

Note You need to log in before you can comment on or make changes to this bug.