I am running two servers in HA mode (with no agent connected yet). The server started first shows the following messages repeatedly 14:51:15,503 WARN [JobStoreCMT] This scheduler instance (fedora9.home.pilhuhn.de1238762655424) is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior. The logging of this message seems to have started around the time the the 2nd server was starting the scheduler. Note that the clocks are not completely in sync (small number of seconds away).
from my readings on this issue, i don't think this is fatal. however, just to be on the safe side, do a little investigation for me. try and figure out whether all servers are still round-robin'ing the clustered quartz jobs, and that the ejb-timer-based jobs are still functioning on each server. note: the quartz documentation notes that the clocks must be within 1 second of each other -- http://www.opensymphony.com/quartz/wikidocs/TutorialLesson11.html
also saw this when the oracle database was killed , then the server was killed (the entire test environment was shutdown unbeknowst to me). I think this might have something to do with a stateful job currently in progress during that shutdown. Since stateful jobs can only run on a single box, if the system was killed and then restarted, that stateful job might get restarted on another box from where it was originally running. This is just a guess, but I definitely started getting these warn messages when restarted after the DB and server was abruptly killed.
just found out one of my nodes' clock is off by 12 hours. I bet that's the cause
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1925
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs. keyword: new = Tracking + FutureFeature + SubBug
making sure we're not missing any bugs in rhq_triage
Closing per 25-Aug triage.