| Summary: | Adding recovery alert to existing alert causes alert to not fire anymore | ||
|---|---|---|---|
| Product: | [JBoss] JBoss Operations Network | Reporter: | Mike Thompson <mithomps> |
| Component: | Monitoring - Alerts | Assignee: | RHQ Project Maintainer <rhq-maint> |
| Status: | CLOSED WORKSFORME | QA Contact: | Mike Foley <mfoley> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | JON 3.1.2 | CC: | jshaughn, loleary, myarboro |
| Target Milestone: | --- | ||
| Target Release: | JON 3.2.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-01-07 21:12:34 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Mike Thompson
2013-10-18 15:19:37 UTC
I'm not sure if there is really an issue here or not. Perhaps. But also possible is an understanding issue or recovery alerting and/or alert duration conditions. Moreover, there have been some fixes put in place since 3.1.2 that could feasible affect this behavior. So, I'd suggest it be tested against RHQ 4.9 or JON 3.2 before any further investigation. Remember that when using recovery alerts the idea is that you have two alert defs that are mutually exclusively doing condition matching. The problem alert def should be initially enabled and if fired will be disabled and the recovery alert def will be active. If the recovery alert def fires it will then re-enable the problem alert def and go back to sleep until it is needed again. Availability change conditions match only when the relevant change of availability is detected. Availability duration conditions match only when the relevant change in availability is detected, and then the same availability type is set after the duration period expires. In essence it's a "goes down and stays down for X minutes" condition (if using Down avail, for example). So, in the scenario above, I would expect that if the logEvent alert def was created and enabled, and the recovery alert def was defined, then at the time of the ctrl-c the logEvent alert def should fire, be disabled automatically and the recovery alert def should enable. But, it's quite possible that before the recovery alert def is ready to condition match (especially in 3.1.2, this was sped up in 3.2) that the down availability has already been reported. In this situation the "goes down" portion of the avail duration condition will not match. Therefore the recovery alert def will not fire until perhaps the server cycled again completely. Last comment: I'm not exactly sure why there would be a log event alert def that seems to be looking for a shutdown event, and then a recovery alert def for goes down for x minutes. That seems like two defs for basically the same thing. Typically the recovery alert def would be a "goes up" condition. I tested using the steps described by "Steps to Reproduce" in comment 0 but was unable to see any wrong behavior. It isn't clear what the log events have to do with this as the steps don't seem to describe their use. However, setting up two alert definitions and then later making one recover the other seems to work just fine. One note though, I see that in the test steps the alert to be recovered (Dummy Alert) is disabled to start. Perhaps an enable step was missing in the list but if not, then the behavior would essentially be expected. In other words, the Dummy Alert would never to evaluated because it was disabled and the recovery alert itself (Down Recovery Alert) would never be evaluated because the alert condition of the Dummy Alert never occurred and made Down Recovery Alert eligible for evaluation. If I am missing anything and closing this in error, please provide more information as it related to JBoss ON 3.2. |