Bug 726202 - alerts not firing despite valid baseline values and alert conditions templates for some agents
alerts not firing despite valid baseline values and alert conditions template...
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Alerts (Show other bugs)
3.0.1
Unspecified Unspecified
high Severity medium (vote)
: ---
: ---
Assigned To: Jay Shaughnessy
Mike Foley
:
Depends On:
Blocks: jon3 rhq41beta
  Show dependency treegraph
 
Reported: 2011-07-27 16:05 EDT by Simeon Pinder
Modified: 2012-02-07 14:31 EST (History)
3 users (show)

See Also:
Fixed In Version: 4.1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-02-07 14:31:01 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Simeon Pinder 2011-07-27 16:05:46 EDT
Description of problem: It takes a while to get alerts based off of baseline values even when i)metrics being collected, ii) baseline values set and iii)valid alert definition defined.  After an hour or so the alerts will begin to start firing correctly. 

Version-Release number of selected component (if applicable):
2.4.1

How reproducible:
Regularly.

Steps to Reproduce:
1. Register an agent with a jon server and import a JBossAS instance that has a datasource (Ex. DefaultDS) resource.
2. Navigate to the 'Available Connections' metric and set the baseline values. (Ex. 20) 
3. Create an alert relative the the baseline values for the 'Available Connections' resource.  Ex. Fire alert if Available Connection count is > 50% or max value.
4. Let the JON server monitor the results until alerts begin to fire. Despite correct metric collection, the alerts do not fire for a while.  

Actual results:
Sometimes it take over an hour before alerts begin to regularly fire.

Expected results:
Alerts begin to report immediately after next agent report when all components correctly set.

Additional info:
Comment 1 Jay Shaughnessy 2011-07-27 16:17:55 EDT
*** IMPORTANT TEST/RECREATION NOTE ***

Actually, after talking with Simeon the recreation steps above aren't
quite right. The difference is subtle, for the problem to occur steps
2 and 3 need to be reversed:

1. Register an agent with a jon server and import a JBossAS instance that has a
datasource (Ex. DefaultDS) resource.
2. Create an alert def relative the the baseline values for the 'Available
Connections' resource.  Ex. Fire alert if Available Connection count is > 50%
or max value.
3. Navigate to the 'Available Connections' metric and set the baseline values.
(Ex. 20) 
4. Let the JON server monitor the results until alerts begin to fire. Despite
correct metric collection, the alerts do not fire for a while.
Comment 2 Jay Shaughnessy 2011-07-27 16:24:34 EDT
The issue here was that user created baselines did not properly trigger a
measurement condition cache reload for the affected agent. This was due
to the new Baseline not being committed/visible to the trans executing the
agent status update.

Put another way, if the alert def came before the baseline (alert template
generated or manually added) it would not fire until the cache was
reloaded for that agent, even with a valid baseline established. That
would typically happen when db maintenance ran, by default, hourly.  So,
typically you'd see the alerts magically start appearing sometime within
an hour's time from when the baseline was created. Definitely confusing.

With the fix the agent's condition cache should be reloaded within 1
minute of the baseline addition. And alerting should work normally on
subsequent reception of the necessary metrics.

The fix is going into master, commit hash pending.  No back patching
recommended as the problem does resolve itself.
Comment 3 Jay Shaughnessy 2011-07-27 17:47:02 EDT
master commit: 0890ce0e4f2e6331d3bcca15c799eecba8a15f38

moving to on-qa
Comment 4 Sunil Kondkar 2011-07-29 05:18:19 EDT
Verified on build224 (Version: 4.1.0-SNAPSHOT Build Number: 60c8260)

Created an alert for DefaultDS resource as 'Available Connection count is > 50%
of max value'. Navigated to the 'Available Connections' metric and set the baseline values to 20.

Verified that alerts begin to report immediately after agent measurement collection for the metrics took place.

Marking as verified.
Comment 5 Mike Foley 2012-02-07 14:31:01 EST
marking VERIFIED BZs to CLOSED/CURRENTRELEASE
Comment 6 Mike Foley 2012-02-07 14:31:01 EST
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.