1019941 – Only one availability duration alert can be triggered per resource due to non-unique scheduler trigger name/group

Bug 1019941 - Only one availability duration alert can be triggered per resource due to non-unique scheduler trigger name/group

Summary: Only one availability duration alert can be triggered per resource due to non...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Operations Network
Classification:	JBoss
Component:	Monitoring - Alerts
Sub Component:
Version:	JON 3.1.2,JON 3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	ER07
Target Release:	JON 3.2.0
Assignee:	Jay Shaughnessy
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:	888927
Blocks:	1012435 1028526 1028527
TreeView+	depends on / blocked

Reported:	2013-10-16 16:28 UTC by Larry O'Leary
Modified:	2018-12-03 20:19 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Clones:	1028527 (view as bug list)
Environment:
Last Closed:
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	514433	0	None	None	None	Never

Description Larry O'Leary 2013-10-16 16:28:51 UTC

Description of problem:
If more then one alert definition is defined which includes a condition type of _Availability Duration_ using the same availability type, only one of the alert definitions will be valid and active when the availability duration type condition is met.

Version-Release number of selected component (if applicable):
4.4.0.JON312GA

How reproducible:
Always

Steps to Reproduce:
1.  Start JBoss ON 3.1.2 system.
2.  Start JON 3.1.2 system.
3.  Import the agent's platform resource into inventory.
4.  Create the following 2 alerts for the platform resource:

    1.  Alert _Name_: `Alert - Platform Down for 1m`
    
        *   _Condition Type_: _Availability Duration_
        *   _Availability Duration_: _Stays Down_
        *   _Duration_: `1` _minutes_
        
    2.  Alert _Name_: `Alert - Platform Down for 2m`
    
        *   _Condition Type_: _Availability Duration_
        *   _Availability Duration_: _Stays Down_
        *   _Duration_: `2` _minutes_
        
5.  Shutdown the agent.
6.  Wait for up to 4 minutes.

Actual results:
The platform's alert history will only contain _Alert - Platform Down for 1m_.

Expected results:
The platform's alert history will contain _Alert - Platform Down for 1m_ and _Alert - Platform Down for 2m_.

Additional info:
When testing, during the second execution of the test I could see the following warning/exception get logged which may explain this failure:

2013-10-16 11:19:05,784 INFO  [org.rhq.enterprise.server.core.AgentManagerBean] Agent with name [localhost.localdomain] just went down
2013-10-16 11:19:05,883 WARN  [org.rhq.enterprise.server.alert.engine.model.AvailabilityDurationCacheElement] Unable to schedule availability duration job for [Resource[id=10001, uuid=null, type=<null>, key=null, name=null, parent=<null>]] with JobData [org.quartz.utils.DirtyFlagMap$DirtyFlagCollection@2b8d41ce]
org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'AVAIL_DURATION_DOWN-10001' and group: 'org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob', because one already exists with this identification.
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.storeTrigger(JobStoreSupport.java:1176)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport$5.execute(JobStoreSupport.java:1152)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport$40.execute(JobStoreSupport.java:3688)
	at org.quartz.impl.jdbcjobstore.JobStoreCMT.executeInLock(JobStoreCMT.java:244)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInLock(JobStoreSupport.java:3684)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.storeTrigger(JobStoreSupport.java:1148)
	at org.quartz.core.QuartzScheduler.scheduleJob(QuartzScheduler.java:779)
	at org.quartz.impl.StdScheduler.scheduleJob(StdScheduler.java:276)
	at org.rhq.enterprise.server.scheduler.SchedulerService.scheduleJob(SchedulerService.java:193)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
	at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
	at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
	at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
	at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
	at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:305)
	at $Proxy580.scheduleJob(Unknown Source)
	at org.rhq.enterprise.server.scheduler.SchedulerBean.scheduleJob(SchedulerBean.java:211)
    ...


Not sure why this exception didn't occur in the first place so perhaps it is only partially related?

Comment 1 Larry O'Leary 2013-11-07 21:49:41 UTC

I updated the title as this issue does only apply to a resource that may have two or more availability duration conditions defined. But it also applies to a resource with a single availability duration applied if the resource's availability changes to meet the condition and then changes to something else and back again to meet the condition a second time before the first duration has expired. For example:

Goes Down for 10 minutes

--> went down for 1 minutes
--> went up for 5 minutes
--> went down for 15 minutes

In this case, the second "went down" would result in the exception mentioned in comment #0 due to the duplicate availability duration jobs running on the same resource.

Comment 2 Jay Shaughnessy 2013-11-08 16:03:54 UTC

The release branch is not open for 3.2.1 Target as 3.2.0 is still underway.  So, leaving as ASSIGNED and recording only the push to master.

master commit b3ac322d1538d4e2789cdc85a4cb3358fddb6758
Author: Jay Shaughnessy <jshaughn>
Date:   Fri Nov 8 09:07:18 2013 -0500

Fix this assumption by making the trigger name more unique: add a timestamp.

Comment 3 Jay Shaughnessy 2013-11-13 15:18:36 UTC

release/jon3.2.x commit f4e76421928f54684f8d3dff14a6c4ca16a3ce4a
Author: Jay Shaughnessy <jshaughn>
Date:   Wed Nov 13 10:17:44 2013 -0500

    Fix this assumption by making the trigger name more unique: add a timestamp.

    Cherry-Pick master: b3ac322d1538d4e2789cdc85a4cb3358fddb6758

Comment 4 Jay Shaughnessy 2013-11-14 16:31:15 UTC

Test Case:
As a note, the move from quartz trigger to EJB Timer (Bug 1030108)  makes this fix somewhat irrelevant.  But to ensure the use case still works the test case should be executed.  Pick any resource you can easily cycle.

AD-1:  "Stays Down"
  - Stays Down for 5 minutes

Bring down the resource and wait for the avail to change (sitting in the resource detail view is useful, that generates the 15s checks and quickly shows the change).
  - this will start the 5 minute timer

Bring up the resource and wait for the avail to change

Bring down the resource (again) and leave it down


There should be:
- no error generated in the server log
- no alert for the first down event (because it didn't stay down)
- an alert for the second down event (after 5 minutes fro the second shutown)

Comment 5 Simeon Pinder 2013-11-19 15:47:49 UTC

Moving to ON_QA as available for testing with new brew build.

Comment 6 Simeon Pinder 2013-11-22 05:13:24 UTC

Mass moving all of these from ER6 to target milestone ER07 since the ER6 build was bad and QE was halted for the same reason.

Comment 7 Filip Brychta 2013-12-11 10:02:46 UTC

Verified on
Version :	
3.2.0.GA
Build Number :	
7b00246:6d13523

Verified scenarios from description and from comment 4

Note You need to log in before you can comment on or make changes to this bug.