How to repeat:
Create an alert template (i used the Jopr/RHQ Server resource) that always fires (i used "Active Thread Count > -1"). Now go to the Alert tab of the RHQ server. Notice there is no child alert definition created, nor does the alert fire. I have run this test twice, in each case, the alert def eventtually appears and starts firing, but not until 1-3 hours later.
tested on solaris/postgres and linux/postgres.
Pushed to 1.4
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2257
This still looks like an issue in rhq master. Can you please work with Heiko and find out why the scenario in the description takes 1-3 hours for alerts to fire.
Can we adjust a setting, or does something need to be fixed.
mass move to rhq_chainsaw tracker bug
This bug has now been triaged by Chainsaw on 2/18. The expectation is the bug to be addressed by the end of sprint06 roughly 3/10/10.
fix performance when creating/updating alert definitions en masse
1) revamp mechanism from hibernate session manipulation to transactional boundary manipulation
2a) make aggregate (template & group) create/update NOT execute in transaction
2b) make individual/child create/update alert definition execute in their own transactions
2c) no longer continue create/update logic on exception, hault processing and show exception to the UI
2d) purge unmatched AlertDampeningEvents and AlertConditionLogs in its own transaction
3a) change cache loading/reloading so that the main logic occurs outside a transaction
3b) when the cache queries the DB, that will occur in a transaction
4) remove flush/clear methods, which are no longer needed with small transactions
5) implement more robust cache reloading, retrying on exceptions due to hibernate session consistency issues
change error-handling for template/group alert definition create/update workflows
* instead of failing fast on first error, presume that errors on rare and continue on failure
* after the create/update workflow completes, if any errors, throw the first exception back tot he caller
* exceptions indicate that one or more updates failed and will give the cause
* any updates that didn't experience an exception will be updated, because each individual alert definition update occurred in its own transaction
QA Verified (by automated overnight run). Postgres/Linux build b75aca8.
Mass-closure of verified bugs against JON.