535576 – (RHQ-2257) Alert Templates take hours to be applied to existing resources

Bug 535576 (RHQ-2257) - Alert Templates take hours to be applied to existing resources

Summary: Alert Templates take hours to be applied to existing resources

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	RHQ-2257
Product:	RHQ Project
Classification:	Other
Component:	Alerts
Sub Component:
Version:	1.3pre
Hardware:	All
OS:	All
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Joseph Marques
QA Contact:	Chandrasekar Kannan
Docs Contact:
URL:	http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks:	RHQ-905 RHQ_Alerts rhq_spearhead jon-sprint8-bugs jon-sprint9-bugs jon-sprint10-bugs
TreeView+	depends on / blocked

Reported:	2009-07-22 20:11 UTC by Jeff Weiss
Modified:	2015-01-04 23:40 UTC (History)
CC List:	4 users (show)
Fixed In Version:	2.4
Clone Of:
Environment:	rev4508
Last Closed:	2010-08-12 16:50:31 UTC
Embargoed:

Attachments	(Terms of Use)

Description Jeff Weiss 2009-07-22 20:11:00 UTC

How to repeat:

Create an alert template (i used the Jopr/RHQ Server resource) that always fires (i used "Active Thread Count > -1").  Now go to the Alert tab of the RHQ server.  Notice there is no child alert definition created, nor does the alert fire.  I have run this test twice, in each case, the alert def eventtually appears and starts firing, but not until 1-3 hours later.

tested on solaris/postgres and linux/postgres.

Comment 1 Corey Welton 2009-08-26 18:45:08 UTC

Pushed to 1.4

Comment 2 Red Hat Bugzilla 2009-11-10 21:00:56 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2257

Comment 3 wes hayutin 2010-02-07 14:40:14 UTC

Steve,
This still looks like an issue in rhq master. Can you please work with Heiko and find out why the scenario in the description takes 1-3 hours for alerts to fire.

Can we adjust a setting, or does something need to be fixed.
Thank you!

Comment 4 wes hayutin 2010-02-17 13:27:04 UTC

mass move to rhq_chainsaw tracker bug

Comment 5 wes hayutin 2010-02-18 14:49:31 UTC

This bug has now been triaged by Chainsaw on 2/18. The expectation is the bug to be addressed by the end of sprint06 roughly 3/10/10.

Comment 6 Joseph Marques 2010-05-04 13:46:03 UTC

commit 6d09c27b3dee045a909e820eaaa1ecb7e378906a

fix performance when creating/updating alert definitions en masse
    
1) revamp mechanism from hibernate session manipulation to transactional boundary manipulation
2a) make aggregate (template & group) create/update NOT execute in transaction
2b) make individual/child create/update alert definition execute in their own transactions
2c) no longer continue create/update logic on exception, hault processing and show exception to the UI
2d) purge unmatched AlertDampeningEvents and AlertConditionLogs in its own transaction
3a) change cache loading/reloading so that the main logic occurs outside a transaction
3b) when the cache queries the DB, that will occur in a transaction
4) remove flush/clear methods, which are no longer needed with small transactions
5) implement more robust cache reloading, retrying on exceptions due to hibernate session consistency issues

Comment 7 Joseph Marques 2010-05-12 21:02:51 UTC

commit de743f644555cd0d8857eabb779dca621e5a0cfa

change error-handling for template/group alert definition create/update workflows
    
* instead of failing fast on first error, presume that errors on rare and continue on failure
* after the create/update workflow completes, if any errors, throw the first exception back tot he caller
* exceptions indicate that one or more updates failed and will give the cause
* any updates that didn't experience an exception will be updated, because each individual alert definition update occurred in its own transaction

Comment 8 Jeff Weiss 2010-05-13 12:49:29 UTC

QA Verified (by automated overnight run).  Postgres/Linux build b75aca8.

Comment 10 Corey Welton 2010-08-12 16:50:31 UTC

Mass-closure of verified bugs against JON.

Note You need to log in before you can comment on or make changes to this bug.