Bug 535788 (RHQ-2448) - Alerts don't fire for up to 1.5 hours after creation
Summary: Alerts don't fire for up to 1.5 hours after creation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: RHQ-2448
Product: RHQ Project
Classification: Other
Component: Alerts
Version: 1.3
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Joseph Marques
QA Contact: Corey Welton
URL: http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks: RHQ_Alerts rhq_spearhead jon-sprint9-bugs
TreeView+ depends on / blocked
 
Reported: 2009-10-02 12:24 UTC by Jeff Weiss
Modified: 2014-11-09 22:49 UTC (History)
3 users (show)

Fixed In Version: 2.4
Clone Of:
Environment:
postgres, linux, 1.3GA
Last Closed: 2010-08-12 16:59:35 UTC
Embargoed:


Attachments (Terms of Use)

Description Jeff Weiss 2009-10-02 12:24:00 UTC
How to repeat:

On the RHQ Server resource, set the Metric Collection Interval for "Active Thread Count" to 1 minute.  Now create an alert on the same resource, with the condition "Active Thread Count > -1".  This alert should always fire, and with the collection interval set to 1 minute, the alert should fire within 1 minute.  The problem is it doesn't.  It takes anywhere from 15 to 90 minutes.   Thereafter, it fires every minute as expected, but the problem is the delay before the first firing.

Comment 1 John Mazzitelli 2009-10-02 12:52:07 UTC
perhaps this has something to do with the need for the agent to "connect" to the server in order to load the alert cache. If for some reason the server fails to load the alert cache for the agent, alerts don't fire. That's the only reason why I would think alerts wouldn't fire for a long period.

Comment 2 Red Hat Bugzilla 2009-11-10 21:04:38 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2448


Comment 3 wes hayutin 2010-02-16 16:59:16 UTC
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.

keyword:
new = Tracking + FutureFeature + SubBug

Comment 4 wes hayutin 2010-02-16 17:04:09 UTC
making sure we're not missing any bugs in rhq_triage

Comment 5 wes hayutin 2010-02-17 13:27:28 UTC
mass move to rhq_chainsaw tracker bug

Comment 6 wes hayutin 2010-02-17 13:36:04 UTC
moving any remaining Alert related bugs to rhq_chainsaw

Comment 7 wes hayutin 2010-02-18 14:49:56 UTC
This bug has now been triaged by Chainsaw on 2/18. The expectation is the bug to be addressed by the end of sprint06 roughly 3/10/10.

Comment 8 Jeff Weiss 2010-04-15 16:42:18 UTC
This bug is blocking ALL automated alert testing.  Needs to be addressed ASAP.

Comment 9 Joseph Marques 2010-04-30 15:26:54 UTC
commit 546f7dbcc30ecb0666f49958f275baac3c640151

fix for newly created event/measurement-based alerts not firing
    
* was previously trying to set agent status bit by alert definition id via pure JPQL
* however, at the time the JPQL is executed, the alert definition hasn't been persisted yet
* fix was to correlate the cache reload to resourceId instead, which is only required in the CREATE case
* added new method to StatusManagerBean called updateByResource to handle this new path
* updated logic in notifyAlertConditionCacheManager to switch on the AlertDefinitionEvent appropriately
* added more debug-level logging

Comment 10 Corey Welton 2010-05-04 18:56:19 UTC
qa -> cwelton

Comment 11 Corey Welton 2010-05-04 19:00:37 UTC
QA Verified... alerts now begin to fire immediately.  Great!

Comment 12 Corey Welton 2010-08-12 16:59:35 UTC
Mass-closure of verified bugs against JON.


Note You need to log in before you can comment on or make changes to this bug.