Bug 535204 (RHQ-1925)

Summary: Lots of quartz warnings in HA mode
Product: [Other] RHQ Project Reporter: Heiko W. Rupp <hrupp>
Component: No ComponentAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 1.2Keywords: SubBug
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
URL: http://jira.rhq-project.org/browse/RHQ-1925
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-25 15:39:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 565628    

Description Heiko W. Rupp 2009-04-03 13:02:00 UTC
I am running two servers in HA mode (with no agent connected yet).

The server started first shows the following messages repeatedly

14:51:15,503 WARN  [JobStoreCMT] This scheduler instance (fedora9.home.pilhuhn.de1238762655424) is still active but was recovered by another instance in the cluster.  This may cause inconsistent behavior.

The logging of this message seems to have started around the time the the 2nd server was starting the scheduler.
Note that the clocks are not completely in sync (small number of seconds away).


Comment 1 Joseph Marques 2009-04-03 13:10:38 UTC
from my readings on this issue, i don't think this is fatal.  however, just to be on the safe side, do a little investigation for me.  try and figure out whether all servers are still round-robin'ing the clustered quartz jobs, and that the ejb-timer-based jobs are still functioning on each server.

note: the quartz documentation notes that the clocks must be within 1 second of each other -- http://www.opensymphony.com/quartz/wikidocs/TutorialLesson11.html

Comment 2 John Mazzitelli 2009-08-17 17:11:23 UTC
also saw this when the oracle database was killed , then the server was killed (the entire test environment was shutdown unbeknowst to me).

I think this might have something to do with a stateful job currently in progress during that shutdown. Since stateful jobs can only run on a single box, if the system was killed and then restarted, that stateful job might get restarted on another box from where it was originally running. This is just a guess, but I definitely started getting these warn messages when restarted after the DB and server was abruptly killed.

Comment 3 John Mazzitelli 2009-08-17 17:13:53 UTC
just found out one of my nodes' clock is off by 12 hours. I bet that's the cause

Comment 4 Red Hat Bugzilla 2009-11-10 20:49:36 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1925


Comment 5 wes hayutin 2010-02-16 16:52:25 UTC
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.

keyword:
new = Tracking + FutureFeature + SubBug

Comment 6 wes hayutin 2010-02-16 16:58:12 UTC
making sure we're not missing any bugs in rhq_triage

Comment 7 Corey Welton 2010-08-25 15:39:36 UTC
Closing per 25-Aug triage.