Red Hat Bugzilla – Bug 535675
Make compression job not happen immediately after startup
Last modified: 2015-02-01 18:25:28 EST
-Delay the compression so it only runs on the hour
-Execute the compression before Quartz starts up so that the compression job has the database to itself.
StartupServlet should execute the compression job before anything else
need to be careful to make sure no more than a single server is doing this compression - quartz does that but we can't rely on quartz to trigger this initial job - because quartz upon startup will probably already hvae a compression job and will trigger it - or anotehr server running may be doing compression.
rhq-server.properties can have a setting:
if set to true, startup servlet will do this prior to starting quartz.
unless you use quartz to do it...
have a separate "startup data purge job" - it is a stateful job. it has a known job name that all servers know.
1) start quartz in "paused" mode - all jobs must not trigger at quartz startup
2) pause all jobs
3) schedule stateful "startup purge job" for a single trigger and to trigger NOW
4) wait for that job to complete
5) unpause the other jobs
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2346
mass move off the qa triage list. These are tasks for dev.
As it stands today, DataPurgeJob *is* a quartz job, so we can't execute it before quartz starts up - quartz needs to be started to run the job. We could fix that by refactoring the bulk of job down into some SLSB method, and then having quartz call that method.
Keep in mind, DataPurgeJob is already a stateful job, which means that quartz will *ensure* that only one server is ever running that job at a time. The only reason we're seeing a DataPurgeJob run at startup, is because a previous schedule was missed. If quartz sees that the job should have been executed at t=60, and it's currently t=63 when the server starts, it means that it missed a trigger and will fire it immediately.
Have we seen issues with the current implementation in production?
IIRC the issues were around servers being down for a long time, having to do a large compression job and deal with many spooled agent reports at the same time. This bug would mitigate this, at least to a certain extent, i.e. as long as you don't start up at the top of hour. The workaround is to stand up the servers in maintenance mode and let the quartz job run then switch to normal mode and let the agents in. I don't see huge value in this, so its not something we should be spending a lot of time I think.