Red Hat Bugzilla – Bug 644048
Recovery alerts not working
Last modified: 2011-05-23 21:07:08 EDT
Description of problem:
I you create an Alert for an AS server Availability DOWN, then another for Availability UP with the Recovery Alert "Availability DOWN Alert", the DOWN alert is sent, but the UP is not. If you remove the value for Recovery Alert, the UP alert is sent
Version-Release number of selected component (if applicable):
Steps to Reproduce:
I could not reproduce this but I can see two potential issues that you might have hit, even though none of them is a bug as such but rather a consequence of the distributed nature of the alert processing.
Firstly, it is important to note that there can be up to 30s lag before an alert condition is found to be satisfied. This can happen in a situation with multiple RHQ servers in a HA environment, where the user logs in on one server and modifies an alert definition on a resource managed by an RHQ agent that is currently connected to the second RHQ server. It can take up to 30s for the other server to "realize" that the condition was edited.
Secondly, and I think this is a more probable cause of what you're seeing, the recovery alert only fires if its conditions are met AND the original alert is found disabled (either manually or because it was set up to automatically disable after firing). This is basically a performance optimization which assumes that the only purpose of the recovery alert is to re-enable the original one.
If the original alert is enabled, there is no need for the recovery alert to fire, because it would have had no effect on the original alert.
Note that this is an area that is up-to-debate because there are many use cases that would support the current logic as I described it above as well as use cases that would support the logic you were probably expecting - i.e. for the recovery alert to fire regardless of the state of the original alert. If you have some thoughts on this and think we should change the above described behaviour, please share your use cases.
If on the other hand you believe you were experiencing some other issue than the two I outlined above, I'd like to ask you for as detailed reproduction steps as possible, because alerting is a critical feature of RHQ/JON and we definitely want to have it as performing and bug-free as possible.
*** Bug 645505 has been marked as a duplicate of this bug. ***
Pushing ot ON-QA to see they can reproduct on 2.4.1
Verified on Jon241 build#34
Created an alert for 'Availability Goes Down' on Jbossas server (Also selected action filter 'Disable alert until re-enabled manually or by recovery alert' )
Created a second alert for 'Availability Goes Up' and selected recovery alert for availability down.
The alert for 'Availability Goes Down' is triggered when jbossas server goes down and also alert for 'Availability Goes Up' is triggered when restarted the jbossas server.
Also verified that the Availability Goes Down alert displays value 'Yes' for alert property 'Active' after 'Availability Goes Up' is triggered.
Bookkeeping - closing bug - fixed in recent release.