Hide Forgot
Created attachment 832709 [details] doubleRecoveryAlert Description of problem: recovery alert fired twice in HA environment Version-Release number of selected component (if applicable): jon 3.2 CR1 How reproducible: noticed once Steps to Reproduce: (steps take under 1030108 ) 1. Install and start EAP 6 standalone server. 2. Install a two server (HA) JON 3.1.2 system. 4. Start server-02 of JON HA system and wait for it to come up. 5. Start server-01 of JON HA system and wait for it to come up. Be sure server 2 is started first so Quartz is running there. 6. Start agent in foreground. 7. Import EAP 6 standalone server into inventory and configure connection settings. 8. Create new _stays down for 2 minutes_ alert definition for EAP resource: *Name*: `Alert - Profile Down` *Condition*: *Fire alert when*: _ANY_ *Condition Type*: _Availability Duration_ *Availability Duration*: _Stays Down_ *Duration*: `2` _minutes_ *Recovery*: *Disable When Fired*: _Yes_ 9. Create new _recovery_ alert definition for EAP default resource: *Name*: `Recovery - Profile Down` *Condition*: *Fire alert when*: _ANY_ *Condition Type*: _Availability Change_ *Availability*: _Goes up_ *Recovery*: *Recovery Alert*: _Alert - Profile Down_ 10. From outside of JBoss ON, shutdown EAP server. 11. From agent prompt, execute avail -f 12. Verify EAP availability shows DOWN. 13. Wait approximately 1 minute and 45 seconds. 14. Start EAP server from outside of JON. 15. Wait approximately 15 seconds 16. From agent prompt, execute avail -f Actual results: recovery alert fired twice (1 sec difference) Expected results: recovery alert fired once Additional info: screen-shot and logs attached
Created attachment 832710 [details] s1_server.log
Created attachment 832712 [details] s1_agent.log
Created attachment 832713 [details] s2_agent.log
Created attachment 832714 [details] s2_server.log
I can't really explain this but there are some odd entries in the s1 log that could indicate some strangeness in the DB or possibly something off with the CR build. Availability record repair was applied in one place, and odd issues with CriteriaQueryRunner and Alert.recoveryAlertDefinition. Unless we see this in a GA or current master build I'm not sure there is anything to do here.
Armine, there is nothing to do here that I can think of. Shall we close it or do you have an idea for pursuing further?
I haven't seen this anymore as well, so I guess this can be closed as wont-fix or works-for-me.
Thanks, Armine. Closing as WorksForMe.