Description of problem: It takes a while to get alerts based off of baseline values even when i)metrics being collected, ii) baseline values set and iii)valid alert definition defined. After an hour or so the alerts will begin to start firing correctly. Version-Release number of selected component (if applicable): 2.4.1 How reproducible: Regularly. Steps to Reproduce: 1. Register an agent with a jon server and import a JBossAS instance that has a datasource (Ex. DefaultDS) resource. 2. Navigate to the 'Available Connections' metric and set the baseline values. (Ex. 20) 3. Create an alert relative the the baseline values for the 'Available Connections' resource. Ex. Fire alert if Available Connection count is > 50% or max value. 4. Let the JON server monitor the results until alerts begin to fire. Despite correct metric collection, the alerts do not fire for a while. Actual results: Sometimes it take over an hour before alerts begin to regularly fire. Expected results: Alerts begin to report immediately after next agent report when all components correctly set. Additional info:
*** IMPORTANT TEST/RECREATION NOTE *** Actually, after talking with Simeon the recreation steps above aren't quite right. The difference is subtle, for the problem to occur steps 2 and 3 need to be reversed: 1. Register an agent with a jon server and import a JBossAS instance that has a datasource (Ex. DefaultDS) resource. 2. Create an alert def relative the the baseline values for the 'Available Connections' resource. Ex. Fire alert if Available Connection count is > 50% or max value. 3. Navigate to the 'Available Connections' metric and set the baseline values. (Ex. 20) 4. Let the JON server monitor the results until alerts begin to fire. Despite correct metric collection, the alerts do not fire for a while.
The issue here was that user created baselines did not properly trigger a measurement condition cache reload for the affected agent. This was due to the new Baseline not being committed/visible to the trans executing the agent status update. Put another way, if the alert def came before the baseline (alert template generated or manually added) it would not fire until the cache was reloaded for that agent, even with a valid baseline established. That would typically happen when db maintenance ran, by default, hourly. So, typically you'd see the alerts magically start appearing sometime within an hour's time from when the baseline was created. Definitely confusing. With the fix the agent's condition cache should be reloaded within 1 minute of the baseline addition. And alerting should work normally on subsequent reception of the necessary metrics. The fix is going into master, commit hash pending. No back patching recommended as the problem does resolve itself.
master commit: 0890ce0e4f2e6331d3bcca15c799eecba8a15f38 moving to on-qa
Verified on build224 (Version: 4.1.0-SNAPSHOT Build Number: 60c8260) Created an alert for DefaultDS resource as 'Available Connection count is > 50% of max value'. Navigated to the 'Available Connections' metric and set the baseline values to 20. Verified that alerts begin to report immediately after agent measurement collection for the metrics took place. Marking as verified.
marking VERIFIED BZs to CLOSED/CURRENTRELEASE
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE