Bug 726202

Summary: alerts not firing despite valid baseline values and alert conditions templates for some agents
Product: [Other] RHQ Project Reporter: Simeon Pinder <spinder>
Component: AlertsAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: high    
Version: 3.0.1CC: hrupp, jshaughn, skondkar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-07 19:31:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 678340, 725459    

Description Simeon Pinder 2011-07-27 20:05:46 UTC
Description of problem: It takes a while to get alerts based off of baseline values even when i)metrics being collected, ii) baseline values set and iii)valid alert definition defined.  After an hour or so the alerts will begin to start firing correctly. 

Version-Release number of selected component (if applicable):
2.4.1

How reproducible:
Regularly.

Steps to Reproduce:
1. Register an agent with a jon server and import a JBossAS instance that has a datasource (Ex. DefaultDS) resource.
2. Navigate to the 'Available Connections' metric and set the baseline values. (Ex. 20) 
3. Create an alert relative the the baseline values for the 'Available Connections' resource.  Ex. Fire alert if Available Connection count is > 50% or max value.
4. Let the JON server monitor the results until alerts begin to fire. Despite correct metric collection, the alerts do not fire for a while.  

Actual results:
Sometimes it take over an hour before alerts begin to regularly fire.

Expected results:
Alerts begin to report immediately after next agent report when all components correctly set.

Additional info:

Comment 1 Jay Shaughnessy 2011-07-27 20:17:55 UTC
*** IMPORTANT TEST/RECREATION NOTE ***

Actually, after talking with Simeon the recreation steps above aren't
quite right. The difference is subtle, for the problem to occur steps
2 and 3 need to be reversed:

1. Register an agent with a jon server and import a JBossAS instance that has a
datasource (Ex. DefaultDS) resource.
2. Create an alert def relative the the baseline values for the 'Available
Connections' resource.  Ex. Fire alert if Available Connection count is > 50%
or max value.
3. Navigate to the 'Available Connections' metric and set the baseline values.
(Ex. 20) 
4. Let the JON server monitor the results until alerts begin to fire. Despite
correct metric collection, the alerts do not fire for a while.

Comment 2 Jay Shaughnessy 2011-07-27 20:24:34 UTC
The issue here was that user created baselines did not properly trigger a
measurement condition cache reload for the affected agent. This was due
to the new Baseline not being committed/visible to the trans executing the
agent status update.

Put another way, if the alert def came before the baseline (alert template
generated or manually added) it would not fire until the cache was
reloaded for that agent, even with a valid baseline established. That
would typically happen when db maintenance ran, by default, hourly.  So,
typically you'd see the alerts magically start appearing sometime within
an hour's time from when the baseline was created. Definitely confusing.

With the fix the agent's condition cache should be reloaded within 1
minute of the baseline addition. And alerting should work normally on
subsequent reception of the necessary metrics.

The fix is going into master, commit hash pending.  No back patching
recommended as the problem does resolve itself.

Comment 3 Jay Shaughnessy 2011-07-27 21:47:02 UTC
master commit: 0890ce0e4f2e6331d3bcca15c799eecba8a15f38

moving to on-qa

Comment 4 Sunil Kondkar 2011-07-29 09:18:19 UTC
Verified on build224 (Version: 4.1.0-SNAPSHOT Build Number: 60c8260)

Created an alert for DefaultDS resource as 'Available Connection count is > 50%
of max value'. Navigated to the 'Available Connections' metric and set the baseline values to 20.

Verified that alerts begin to report immediately after agent measurement collection for the metrics took place.

Marking as verified.

Comment 5 Mike Foley 2012-02-07 19:31:01 UTC
marking VERIFIED BZs to CLOSED/CURRENTRELEASE

Comment 6 Mike Foley 2012-02-07 19:31:01 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE