Bug 1028526 - Availability duration alerts ignore availability changes during the specified interval
Summary: Availability duration alerts ignore availability changes during the specified...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Monitoring - Alerts
Version: JON 3.1.2
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ER07
: JON 3.2.0
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On: 1019941 1028473
Blocks: 1012435 1028528
TreeView+ depends on / blocked
 
Reported: 2013-11-08 16:48 UTC by Larry O'Leary
Modified: 2018-12-03 20:36 UTC (History)
2 users (show)

Fixed In Version:
Clone Of: 1028473
: 1028528 (view as bug list)
Environment:
Last Closed: 2014-01-02 20:35:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
eap-server.log (24.94 KB, text/x-log)
2013-11-27 16:48 UTC, Armine Hovsepyan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 535263 0 None None None Never

Description Larry O'Leary 2013-11-08 16:48:41 UTC
+++ This bug was initially created as a clone of Bug #1028473 +++

Availability duration alerts ignore availability changes during the specified interval.

This was actually part of the original design but has turned out to be a liability.  As an example, consider a "Stays Down for 10 Minutes" avail duration condition.

When a 'Goes Down' avail change is detected for resource X it starts the 10 minute interval.  When the 10 minute interval completes, if the the current availability for X is 'Down' then the condition is satisfied (and typically an alert is fired, as these are usually single-condition alert definitions).

But what if during that 10 minutes X actually came up and then went down again?  In this case the alert would have fired but it really should not have.  It did not *stay down* for 10 minutes.  In fact, the second time it went down another 10 minute interval will have started.

The expectation is that the "stays down" or "stays not up" semantic is strictly followed.

--- Additional comment from Jay Shaughnessy on 2013-11-08 10:58:32 EST ---


master commit d5825367a91de8b8adc4491dae354aeb10802ae4
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Nov 8 10:42:16 2013 -0500

Fixed such that avail duration conditions will not be satisfied unless the
avail stays constant throughout the specified time interval.

Also: Fix a jdoc mistake around param units in AvailabilityCriteria

Comment 1 Larry O'Leary 2013-11-08 17:51:02 UTC
Steps to reproduce:
1.  Install and start JBoss EAP 6 standalone server.
2.  Install and start JBoss ON 3.1.2 system.
3.  Import EAP standalone server into inventory and configure its connection settings.
4.  Create new _stays down for 5 minutes_ alert definition for EAP resource:

    *Name*: `Alert - Profile Down`
    *Condition*:
        *Fire alert when*:          _ANY_
        *Condition Type*:           _Availability Duration_
        *Availability Duration*:    _Stays Down_
        *Duration*:                 `5` _minutes_

5.  Set EAP resource's availability metric schedule to 30 seconds.
6.  Verify EAP resource is reported us UP.
7.  Shutdown EAP resource and wait for availability to report DOWN.
8.  Wait for approximately another minute.
9.  Start EAP resource and wait for availability to report UP.
10. Wait for approximately another 2 minutes.
11. Shutdown EAP resource.
12. Wait for `Alert - Profile Down` alert to fire.
13. Start EAP resource and wait for availability to report UP.

Actual results:
Availability
    Nov 8, 2013 11:37:43 AM - none                    ..... UP
    Nov 8, 2013 11:34:42 AM - Nov 8, 2013 11:37:43 AM 3.0 m DOWN
    Nov 8, 2013 11:33:38 AM - Nov 8, 2013 11:34:42 AM 1.1 m UP
    Nov 8, 2013 11:32:07 AM - Nov 8, 2013 11:33:38 AM 1.5 m DOWN
    Nov 8, 2013 11:29:32 AM - Nov 8, 2013 11:32:07 AM 2.6 m UP	

Alert: 
    Fri Nov 08 11:37:08 GMT-600 2013 Alert - Profile Down Availability Duration [Stays Down For 5 m] 
    
The alert gets fired even though the EAP resource was only DOWN for 3.0 minutes. 


Expected Results:
The alert should not be fired unless the DOWN duration is greater then or equal to 5 minutes.

Comment 2 Jay Shaughnessy 2013-11-14 15:49:44 UTC
Test Case:
The reproduction steps above are valid.  You can use any resource that you can easily bring up and down and an avail duration long enough to be able to cycle the resource 1 time in between.  I used a Tomcat webapp and a duration of 1 minute.

The important thing is that the "Stays DOWN" semantic is now strict.  Therefore any availability change on the resource during the avail duration period should negate the potential alert.

release/jon3.2.x commit 5ef367384b1a8d67b2c69e38b92329fa62e6aeed
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Nov 8 10:42:16 2013 -0500

    Fixed such that avail duration conditions will not be satisfied unless the
    avail stays constant throughout the specified time interval.

    Also: Fix a jdoc mistake around param units in AvailabilityCriteria

Comment 3 Simeon Pinder 2013-11-19 15:48:10 UTC
Moving to ON_QA as available for testing with new brew build.

Comment 4 Simeon Pinder 2013-11-22 05:13:43 UTC
Mass moving all of these from ER6 to target milestone ER07 since the ER6 build was bad and QE was halted for the same reason.

Comment 5 Armine Hovsepyan 2013-11-27 16:48:07 UTC
verified in er7 -> http://d.pr/i/kcV3
log attached with steps and times

Comment 6 Armine Hovsepyan 2013-11-27 16:48:44 UTC
Created attachment 829811 [details]
eap-server.log


Note You need to log in before you can comment on or make changes to this bug.