Bug 888927

Summary: Availability duration conditions limited to one alert definition per resource
Product: [Other] RHQ Project Reporter: Jay Shaughnessy <jshaughn>
Component: AlertsAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: hrupp, jkandasa
Target Milestone: ---   
Target Release: RHQ 4.6   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-24 19:26:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1019941    
Attachments:
Description Flags
alert-defination none

Description Jay Shaughnessy 2012-12-19 19:27:40 UTC
Currently availability duration conditions are limited to one alert definition per resource, per condition type (AVAIL_DURATION_DOWN or AVAIL_DURATION_NOT_UP).

If the same resource has multiple alert definitions with the same avail condition type, for example, two alert defs with AVAIL_DURATION_DOWN, then the check for one of those conditions may never happen.

In the server logs you would see something like:

2012-12-05 08:33:23,284 WARN [org.rhq.enterprise.server.alert.engine.model.AvailabilityDurationCacheElement] Unable to schedule availability duration job for [Resource[id=83726, uuid=null, type=<null>, key=null, name=null, parent=<null>]] with JobData [org.quartz.utils.DirtyFlagMap$DirtyFlagCollection@77ac2d96]

org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'AVAIL_DURATION_DOWN-83726' and group: 'org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob', because one already exists with this identification.

Comment 1 Jay Shaughnessy 2012-12-19 19:30:31 UTC
It looks like issue here is that the Quartz job triggerName is not unique enough.  It is qualified by conditionOperator and resourceId, but should probably be further qualified by alertDefinitionId, thus allowing multiple alert defs for the same resource to schedule simulataneous duration job checks.

Comment 2 Jay Shaughnessy 2012-12-19 22:12:17 UTC
master commit 17d9ac9e8b5a926b468da48eb4cde5c8765453c3
Author: Jay Shaughnessy <jshaughn>
Date:   Wed Dec 19 16:57:06 2012 -0500

Add the hooks to further qualify AvailDuration checkCondition quartz job
trigger names with the alertDefId.  This will allow avail duration conditions
on any number of alert defs for the same resource.


Test Notes
This could be tested by defining more than one alert def for the same resource, both with just avail duration down conditions.  Prior to the fix only one would fire and you would see the above error in the log.  With the fix you should see no error and both should fire.

Comment 3 Jay Shaughnessy 2012-12-20 18:42:21 UTC
There was a subsequent commit to take care of the API diff:

master commit 1769169f2126bc78cc5b237a818f2bd703d2ea83
Author: Jay Shaughnessy <jshaughn>
Date:   Thu Dec 20 12:32:28 2012 -0500

    Add API diff clirr exclusion

Comment 4 Jeeva Kandasamy 2013-09-23 10:44:18 UTC
Created attachment 801564 [details]
alert-defination

Version: 3.2.0.ER1
Build Number: 54dd29c:464a643
GWT Version: 2.5.0
SmartGWT Version: 3.0


Created two alert definition with same condition(agent-not-up-for-1-minute) and for same resource(RHQ Agent), works as expected. Screen shot is attached.