Bug 1119331

Summary:

Job Misfire handler fails: AlertAvailabilityDurationJob' job:Job class must implement the Job interface

Product:

[JBoss] JBoss Operations Network

Reporter:

Larry O'Leary <loleary>

Component:

Monitoring - Alerts

Assignee:

Lukas Krejci <lkrejci>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Mike Foley <mfoley>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

JON 3.1.2

CC:

genman, hrupp, jshaughn, lkrejci, loleary, mmahoney

Target Milestone:

DR02

Target Release:

JON 3.3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

A hotfix change to how availability duration jobs were scheduled (from using the Quartz job service to an EJB timer) caused jobs scheduled before the hotfix was applied--but not evaluated or expired when the server was shut down--to produce Job Misfire Handler errors. The fix now takes into account availability duration jobs left behind and changes the job to meet the new scheduling mechanism.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-12-11 14:02:18 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Alert	none

Description Larry O'Leary 2014-07-14 14:25:31 UTC

Description of problem:
Every four minutes, the following ERROR is reported in the server log:

    ERROR [org.quartz.impl.jdbcjobstore.JobStoreCMT] MisfireHandler: Error handling misfires: Couldn't store trigger 'AVAIL_DURATION_DOWN-712781' for 'org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob' job:Job class must implement the Job interface.
    org.quartz.JobPersistenceException: Couldn't store trigger 'AVAIL_DURATION_DOWN-712781' for 'org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob' job:Job class must implement the Job interface. [See nested exception: java.lang.IllegalArgumentException: Job class must implement the Job interface.]
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.storeTrigger(JobStoreSupport.java:1246)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.doUpdateOfMisfiredTrigger(JobStoreSupport.java:1014)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.recoverMisfiredJobs(JobStoreSupport.java:956)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.doRecoverMisfires(JobStoreSupport.java:3126)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.manage(JobStoreSupport.java:3887)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport$MisfireHandler.run(JobStoreSupport.java:3907)
    Caused by: java.lang.IllegalArgumentException: Job class must implement the Job interface.
        at org.quartz.JobDetail.setJobClass(JobDetail.java:280)
        at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.selectJobDetail(StdJDBCDelegate.java:897)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.storeTrigger(JobStoreSupport.java:1202)
        ... 5 more


This error is also reported every four minutes from a different server at different times.

On server start, it is evident that an availability duration alert was scheduled and missed:

    DEBUG [org.rhq.enterprise.server.scheduler.EnhancedSchedulerImpl] Looks like repeating job [org.rhq.enterprise.server.scheduler.jobs.SavedSearchResultCountRecalculationJob:org.rhq.enterprise.server.scheduler.jobs.SavedSearchResultCountRecalculationJob] is already scheduled - removing it so it can be rescheduled...


Version-Release number of selected component (if applicable):
3.1.2 - Build: d2b7ee5:38ab852 -- Server Hotfix-08

How reproducible:
It is not clear how this issue first occurs. However, once it does, it happens constantly every four minutes.

Comment 1 Lukas Krejci 2014-07-15 21:39:16 UTC

Given hotfixes are not allowed to touch database I do believe this is a case of stale data in the database.

The org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob used to be a Quartz job class that was later transformed into an EJB timer.

Note that this change is NOT present in the release/jon3.1.x branch, just the hotfix branch.

The only explanation for the exception I have is that the job was kept in the database but after applying the hotfix, the job class was no longer able to carry out the job.

Comment 2 Lukas Krejci 2014-07-15 21:40:54 UTC

That said, I assume this exception is essentially harmless because the job is being run normally but as an EJB timer.

Comment 7 Lukas Krejci 2014-07-16 19:52:07 UTC

in master

commit 29d93c61270b35c9a8733fcf83d3a8dde78d907b
Author: Lukas Krejci <lkrejci>
Date:   Wed Jul 16 21:48:22 2014 +0200

    [BZ 1119331] Handle the jobs potentially left behind after conversion
    of the avail duration check job into an EJB time.

Comment 8 Lukas Krejci 2014-07-16 19:59:38 UTC

commit a90aae26a2574f74bcae1a7fab78e7164b759d1a
Author: Lukas Krejci <lkrejci>
Date:   Wed Jul 16 21:58:18 2014 +0200

    [BZ 1119331] Fix a typo in the DB upgrade step

Comment 12 Jay Shaughnessy 2014-07-17 13:43:04 UTC

*** Bug 1120445 has been marked as a duplicate of this bug. ***

Comment 16 Heiko W. Rupp 2014-07-31 15:27:31 UTC

Setting to modified as this is in release/jon3.3.x

Comment 17 Simeon Pinder 2014-07-31 15:51:33 UTC

Moving to ON_QA as available to test with brew build of DR01: https://brewweb.devel.redhat.com//buildinfo?buildID=373993

Comment 18 Larry O'Leary 2014-08-05 23:07:35 UTC

Steps to reproduce: 

1.  Install and start JBoss EAP 6 standalone server.
2.  Install and start JBoss ON 3.1.2.GA system (pre server hotfix-05).
3.  Import JBoss EAP standalone server into inventory.
4.  Configure JBoss EAP standalone resource's connection settings.
5.  Create availability duration alert definition (stays down 10 minutes) on JBoss EAP server.

    *Name*: `Down too long`
    *Condition Type*: _Availability Duration_
    *Availability Duration*: _Stays Down_
    *Duration*: _10_ _minutes_
    
6.  Verify alert definition is working:

    #.  Shutdown JBoss EAP server.
    #.  Wait 11 minutes.
    
    Alert should have been triggered.
    
    #.  Start JBoss EAP server.
    #.  Wait for availability to report as UP.
    
7.  Shutdown JBoss EAP server and wait 2 minutes.
8.  Shutdown JBoss ON system.
9.  Wait 8 minutes.
10. Apply server hotfix-05 or later.
11. Start JBoss ON system.


Actual results:
JBoss ON server.log reports the following ERROR every four minutes:

    2014-07-22 14:01:56,624 ERROR [org.quartz.impl.jdbcjobstore.JobStoreCMT] MisfireHandler: Error handling misfires: Couldn't store trigger 'AVAIL_DURATION_DOWN-10004' for 'org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob' job:Job class must implement the Job interface.
    org.quartz.JobPersistenceException: Couldn't store trigger 'AVAIL_DURATION_DOWN-10004' for 'org.rhq.enterprise.server.scheduler.jobs.AlertAvailabilityDurationJob' job:Job class must implement the Job interface. [See nested exception: java.lang.IllegalArgumentException: Job class must implement the Job interface.]


Expected results:
No error.

Comment 19 Matt Mahoney 2014-08-06 16:01:48 UTC

Created attachment 924525 [details]
Alert