Bug 726202 - alerts not firing despite valid baseline values and alert conditions templates for some agents
Summary: alerts not firing despite valid baseline values and alert conditions template...
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Alerts
Version: 3.0.1
Hardware: Unspecified
OS: Unspecified
high
medium vote
Target Milestone: ---
: ---
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: jon3 rhq41beta
TreeView+ depends on / blocked
 
Reported: 2011-07-27 20:05 UTC by Simeon Pinder
Modified: 2012-02-07 19:31 UTC (History)
3 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2012-02-07 19:31:01 UTC


Attachments (Terms of Use)

Description Simeon Pinder 2011-07-27 20:05:46 UTC
Description of problem: It takes a while to get alerts based off of baseline values even when i)metrics being collected, ii) baseline values set and iii)valid alert definition defined.  After an hour or so the alerts will begin to start firing correctly. 

Version-Release number of selected component (if applicable):
2.4.1

How reproducible:
Regularly.

Steps to Reproduce:
1. Register an agent with a jon server and import a JBossAS instance that has a datasource (Ex. DefaultDS) resource.
2. Navigate to the 'Available Connections' metric and set the baseline values. (Ex. 20) 
3. Create an alert relative the the baseline values for the 'Available Connections' resource.  Ex. Fire alert if Available Connection count is > 50% or max value.
4. Let the JON server monitor the results until alerts begin to fire. Despite correct metric collection, the alerts do not fire for a while.  

Actual results:
Sometimes it take over an hour before alerts begin to regularly fire.

Expected results:
Alerts begin to report immediately after next agent report when all components correctly set.

Additional info:

Comment 1 Jay Shaughnessy 2011-07-27 20:17:55 UTC
*** IMPORTANT TEST/RECREATION NOTE ***

Actually, after talking with Simeon the recreation steps above aren't
quite right. The difference is subtle, for the problem to occur steps
2 and 3 need to be reversed:

1. Register an agent with a jon server and import a JBossAS instance that has a
datasource (Ex. DefaultDS) resource.
2. Create an alert def relative the the baseline values for the 'Available
Connections' resource.  Ex. Fire alert if Available Connection count is > 50%
or max value.
3. Navigate to the 'Available Connections' metric and set the baseline values.
(Ex. 20) 
4. Let the JON server monitor the results until alerts begin to fire. Despite
correct metric collection, the alerts do not fire for a while.

Comment 2 Jay Shaughnessy 2011-07-27 20:24:34 UTC
The issue here was that user created baselines did not properly trigger a
measurement condition cache reload for the affected agent. This was due
to the new Baseline not being committed/visible to the trans executing the
agent status update.

Put another way, if the alert def came before the baseline (alert template
generated or manually added) it would not fire until the cache was
reloaded for that agent, even with a valid baseline established. That
would typically happen when db maintenance ran, by default, hourly.  So,
typically you'd see the alerts magically start appearing sometime within
an hour's time from when the baseline was created. Definitely confusing.

With the fix the agent's condition cache should be reloaded within 1
minute of the baseline addition. And alerting should work normally on
subsequent reception of the necessary metrics.

The fix is going into master, commit hash pending.  No back patching
recommended as the problem does resolve itself.

Comment 3 Jay Shaughnessy 2011-07-27 21:47:02 UTC
master commit: 0890ce0e4f2e6331d3bcca15c799eecba8a15f38

moving to on-qa

Comment 4 Sunil Kondkar 2011-07-29 09:18:19 UTC
Verified on build224 (Version: 4.1.0-SNAPSHOT Build Number: 60c8260)

Created an alert for DefaultDS resource as 'Available Connection count is > 50%
of max value'. Navigated to the 'Available Connections' metric and set the baseline values to 20.

Verified that alerts begin to report immediately after agent measurement collection for the metrics took place.

Marking as verified.

Comment 5 Mike Foley 2012-02-07 19:31:01 UTC
marking VERIFIED BZs to CLOSED/CURRENTRELEASE

Comment 6 Mike Foley 2012-02-07 19:31:01 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE


Note You need to log in before you can comment on or make changes to this bug.