Bug 1038202

Summary: recovery alert fired twice
Product: [JBoss] JBoss Operations Network Reporter: Armine Hovsepyan <ahovsepy>
Component: Core ServerAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED WORKSFORME QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: ahovsepy, jshaughn, loleary, mfoley
Target Milestone: ---   
Target Release: JON 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-27 13:13:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
doubleRecoveryAlert
none
s1_server.log
none
s1_agent.log
none
s2_agent.log
none
s2_server.log none

Description Armine Hovsepyan 2013-12-04 15:41:48 UTC
Created attachment 832709 [details]
doubleRecoveryAlert

Description of problem:
recovery alert fired twice in HA environment

Version-Release number of selected component (if applicable):
jon 3.2 CR1

How reproducible:
noticed once

Steps to Reproduce: (steps take under 1030108 )
1.  Install and start EAP 6 standalone server.
2.  Install a two server (HA) JON 3.1.2 system.
4.  Start server-02 of JON HA system and wait for it to come up.
5.  Start server-01 of JON HA system and wait for it to come up.
    Be sure server 2 is started first so Quartz is running there.
6.  Start agent in foreground.
7.  Import EAP 6 standalone server into inventory and configure connection settings.
8.  Create new _stays down for 2 minutes_ alert definition for EAP resource:

    *Name*: `Alert - Profile Down`
    *Condition*:
        *Fire alert when*:          _ANY_
        *Condition Type*:           _Availability Duration_
        *Availability Duration*:    _Stays Down_
        *Duration*:                 `2` _minutes_
    *Recovery*:
        *Disable When Fired*:   _Yes_

9.  Create new _recovery_ alert definition for EAP default resource:

    *Name*: `Recovery - Profile Down`
    *Condition*:
        *Fire alert when*:  _ANY_
        *Condition Type*:   _Availability Change_
        *Availability*:     _Goes up_
    *Recovery*:
        *Recovery Alert*:   _Alert - Profile Down_

10. From outside of JBoss ON, shutdown EAP server.
11. From agent prompt, execute avail -f
12. Verify EAP availability shows DOWN.
13. Wait approximately 1 minute and 45 seconds.
14. Start EAP server from outside of JON.
15. Wait approximately 15 seconds
16. From agent prompt, execute avail -f

Actual results:
recovery alert fired twice (1 sec difference)

Expected results:
recovery alert fired once


Additional info:
screen-shot and logs attached

Comment 1 Armine Hovsepyan 2013-12-04 15:42:22 UTC
Created attachment 832710 [details]
s1_server.log

Comment 2 Armine Hovsepyan 2013-12-04 15:43:14 UTC
Created attachment 832712 [details]
s1_agent.log

Comment 3 Armine Hovsepyan 2013-12-04 15:45:34 UTC
Created attachment 832713 [details]
s2_agent.log

Comment 4 Armine Hovsepyan 2013-12-04 15:46:00 UTC
Created attachment 832714 [details]
s2_server.log

Comment 5 Jay Shaughnessy 2014-04-04 20:48:18 UTC
I can't really explain this but there are some odd entries in the s1 log that could indicate some strangeness in the DB or possibly something off with the CR build.  Availability record repair was applied in one place, and odd issues with CriteriaQueryRunner and Alert.recoveryAlertDefinition.

Unless we see this in a GA or current master build I'm not sure there is anything to do here.

Comment 6 Jay Shaughnessy 2014-08-26 22:25:38 UTC
Armine, there is nothing to do here that I can think of. Shall we close it or do you have an idea for pursuing further?

Comment 7 Armine Hovsepyan 2014-08-27 12:36:53 UTC
I haven't seen this anymore as well, so I guess this can be closed as wont-fix or works-for-me.

Comment 8 Jay Shaughnessy 2014-08-27 13:13:17 UTC
Thanks, Armine. Closing as WorksForMe.