801504 – Using an Availability condition on a recovery Alert doesn't trigger Alert or Recovery

Bug 801504 - Using an Availability condition on a recovery Alert doesn't trigger Alert or Recovery

Summary: Using an Availability condition on a recovery Alert doesn't trigger Alert or ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Alerts
Sub Component:
Version:	4.2
Hardware:	All
OS:	All
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	RHQ 4.4.0
Assignee:	RHQ Project Maintainer
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:	787227
Blocks:	jon310-sprint11, rhq44-sprint11
TreeView+	depends on / blocked

Reported:	2012-03-08 17:27 UTC by Charles Crouch
Modified:	2015-02-01 23:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:	787227
Environment:
Last Closed:	2012-11-16 03:25:28 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	878224	0	high	CLOSED	Updated alert defs may not fire in an HA environment	2021-02-22 00:41:40 UTC

Internal Links: 878224

Description Charles Crouch 2012-03-08 17:27:00 UTC

+++ This bug was initially created as a clone of Bug #787227 +++

Description of problem:
Using an Availability condition on a recovery Alert doesn't trigger Alert or Recovery.  If you define an Alert for Availability "Comes up" and use this as a Recovery Alert to re-enable another Alert Definition.  You never receive the alert for the Availability nor does it re-enable the other alert.

Version-Release number of selected component (if applicable):
3.0

How reproducible:
Everytime

Steps to Reproduce:
1. Define an alert "OOM" using the condition type "event detection", the event severity "error" and the regular expression "java.lang.OutOfMemoryError".
2. Add email notification and the "Restart" Resource Operation 
3. Select "Yes" for "Disable When Fired"
4. Define another alert "OOM (Recovery)" using condition type "Availability Change", Availability "Comes Up"
5. On the Recovery tab select the "OOM" Alert for Recover Alert.
6. Trigger an OOM error on the EAP server, I set the Max Heap size low.
7. The alert will trigger for OOM, it will be disabled and the Restart is triggered.
8. You can see that the EAP is restarted and running, but the "OOM (Recovery)" alert is not triggered, nor does it re-enable the "OOM" Alert.
9. Then change the condition type on the "OOM (Recovery)" to say "event detection" INFO on the Microcontainer started in message:
INFO  [org.jboss.bootstrap.microcontainer.ServerImpl] (main) JBoss (Microcontainer) [5.2.0.GA_SOA (build: SVNTag=5.2.0.GA_SOA date=201111090730)] Started in 1m:34s:817ms
10. With this condition type everything works, "OOM (Recovery)" alert is fired and it re-enables the "OOM" Alert.

  
Actual results:

Alert is not triggered and recover alert is not re-enabled.


Expected results:

Alert is triggered and recover alert is re-enabled.

--- Additional comment from ccrouch on 2012-03-08 11:09:24 EST ---

Are you sure that the OOM of the AS instance and the restart operation actually triggered a change in the availability of the EAP Server? If you choose the EAP server from the inventory then go to its Monitoring>Availability subtab do you see a row in there indicating the EAP instance was unavailable at the time of the OOM condition and then showing available again after the restart operation completed?

--- Additional comment from dsteigne on 2012-03-08 11:59:43 EST ---

Yes, both conditions show on the Availability tab

Comment 1 Charles Crouch 2012-03-08 17:29:08 UTC

Having briefly discussed with jshaughn we couldn't quickly come up with a reason why this shouldn't work, given Debbie's last comment. We should try to repro this with the latest avail changes.

Comment 2 Charles Crouch 2012-03-08 17:32:43 UTC

(11:30:12 AM) jshaughn: ccrouch: note that this sort of thing was verified for the 241 release: see https://bugzilla.redhat.com/show_bug.cgi?id=644048#c6

Comment 3 Jay Shaughnessy 2012-04-03 16:21:42 UTC

I've tested a recovery alert that uses a "Goes UP" condition to re-enable
another alert definition, a "Goes DOWN" alert.

I set the "Goes DOWN" def to disable after firing and then waited for the
Goes UP def to fire and re-enable it.

This worked as expected.

I did not use the exact reproduction steps above because I think it may
be these steps that are flawed and producing the misleading results. I
think ccrouch's question/scenario above is exactly what is happening and
the test needs to be re-done.  For an EAP that is UP a restart operation
likely will not report any change in availability, and if it did it would be
a change to DOWN.  It happens too fast to report a change to DOWN, and then
a change to UP, our avail checking is not that granular.  The test
above would work only if the EAP server was DOWN when the restart operation
took place.

If you really want to keep the spirit of the above test, try the following
using 3 alert def.

OOM Def
-> fire a shutdown operation
-> disable after firing

Goes DOWN Def
-> fire a start operation

Goes UP Def
-> Recovery for OOM Def

Then, with the EAP up ensure the OOM alert def fires.

Comment 4 Larry O'Leary 2012-11-16 03:25:28 UTC

Per comment 3, NOTABUG and we were unable to reproduce unless we used the original reproduction steps which are invalid.

Note You need to log in before you can comment on or make changes to this bug.