Bug 726202
Summary: | alerts not firing despite valid baseline values and alert conditions templates for some agents | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Simeon Pinder <spinder> |
Component: | Alerts | Assignee: | Jay Shaughnessy <jshaughn> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 3.0.1 | CC: | hrupp, jshaughn, skondkar |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 4.1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-02-07 19:31:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 678340, 725459 |
Description
Simeon Pinder
2011-07-27 20:05:46 UTC
*** IMPORTANT TEST/RECREATION NOTE *** Actually, after talking with Simeon the recreation steps above aren't quite right. The difference is subtle, for the problem to occur steps 2 and 3 need to be reversed: 1. Register an agent with a jon server and import a JBossAS instance that has a datasource (Ex. DefaultDS) resource. 2. Create an alert def relative the the baseline values for the 'Available Connections' resource. Ex. Fire alert if Available Connection count is > 50% or max value. 3. Navigate to the 'Available Connections' metric and set the baseline values. (Ex. 20) 4. Let the JON server monitor the results until alerts begin to fire. Despite correct metric collection, the alerts do not fire for a while. The issue here was that user created baselines did not properly trigger a measurement condition cache reload for the affected agent. This was due to the new Baseline not being committed/visible to the trans executing the agent status update. Put another way, if the alert def came before the baseline (alert template generated or manually added) it would not fire until the cache was reloaded for that agent, even with a valid baseline established. That would typically happen when db maintenance ran, by default, hourly. So, typically you'd see the alerts magically start appearing sometime within an hour's time from when the baseline was created. Definitely confusing. With the fix the agent's condition cache should be reloaded within 1 minute of the baseline addition. And alerting should work normally on subsequent reception of the necessary metrics. The fix is going into master, commit hash pending. No back patching recommended as the problem does resolve itself. master commit: 0890ce0e4f2e6331d3bcca15c799eecba8a15f38 moving to on-qa Verified on build224 (Version: 4.1.0-SNAPSHOT Build Number: 60c8260) Created an alert for DefaultDS resource as 'Available Connection count is > 50% of max value'. Navigated to the 'Available Connections' metric and set the baseline values to 20. Verified that alerts begin to report immediately after agent measurement collection for the metrics took place. Marking as verified. marking VERIFIED BZs to CLOSED/CURRENTRELEASE changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE |