Bug 888927 - Availability duration conditions limited to one alert definition per resource
Availability duration conditions limited to one alert definition per resource
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Alerts (Show other bugs)
4.5
All All
high Severity high (vote)
: ---
: RHQ 4.6
Assigned To: Jay Shaughnessy
Mike Foley
:
Depends On:
Blocks: 1019941
  Show dependency treegraph
 
Reported: 2012-12-19 14:27 EST by Jay Shaughnessy
Modified: 2013-11-15 16:45 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-24 15:26:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
alert-defination (40.83 KB, image/jpeg)
2013-09-23 06:44 EDT, Jeeva Kandasamy
no flags Details

  None (edit)
Description Jay Shaughnessy 2012-12-19 14:27:40 EST
Currently availability duration conditions are limited to one alert definition per resource, per condition type (AVAIL_DURATION_DOWN or AVAIL_DURATION_NOT_UP).

If the same resource has multiple alert definitions with the same avail condition type, for example, two alert defs with AVAIL_DURATION_DOWN, then the check for one of those conditions may never happen.

In the server logs you would see something like:

2012-12-05 08:33:23,284 WARN [org.rhq.enterprise.server.alert.engine.model.AvailabilityDurationCacheElement] Unable to schedule availability duration job for [Resource[id=83726, uuid=null, type=<null>, key=null, name=null, parent=<null>]] with JobData [org.quartz.utils.DirtyFlagMap$DirtyFlagCollection@77ac2d96]

org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'AVAIL_DURATION_DOWN-83726' and group: 'org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob', because one already exists with this identification.
Comment 1 Jay Shaughnessy 2012-12-19 14:30:31 EST
It looks like issue here is that the Quartz job triggerName is not unique enough.  It is qualified by conditionOperator and resourceId, but should probably be further qualified by alertDefinitionId, thus allowing multiple alert defs for the same resource to schedule simulataneous duration job checks.
Comment 2 Jay Shaughnessy 2012-12-19 17:12:17 EST
master commit 17d9ac9e8b5a926b468da48eb4cde5c8765453c3
Author: Jay Shaughnessy <jshaughn@redhat.com>
Date:   Wed Dec 19 16:57:06 2012 -0500

Add the hooks to further qualify AvailDuration checkCondition quartz job
trigger names with the alertDefId.  This will allow avail duration conditions
on any number of alert defs for the same resource.


Test Notes
This could be tested by defining more than one alert def for the same resource, both with just avail duration down conditions.  Prior to the fix only one would fire and you would see the above error in the log.  With the fix you should see no error and both should fire.
Comment 3 Jay Shaughnessy 2012-12-20 13:42:21 EST
There was a subsequent commit to take care of the API diff:

master commit 1769169f2126bc78cc5b237a818f2bd703d2ea83
Author: Jay Shaughnessy <jshaughn@redhat.com>
Date:   Thu Dec 20 12:32:28 2012 -0500

    Add API diff clirr exclusion
Comment 4 Jeeva Kandasamy 2013-09-23 06:44:18 EDT
Created attachment 801564 [details]
alert-defination

Version: 3.2.0.ER1
Build Number: 54dd29c:464a643
GWT Version: 2.5.0
SmartGWT Version: 3.0


Created two alert definition with same condition(agent-not-up-for-1-minute) and for same resource(RHQ Agent), works as expected. Screen shot is attached.

Note You need to log in before you can comment on or make changes to this bug.