Bug 535675 (RHQ-2346)

Summary: Make compression job not happen immediately after startup
Product: [Other] RHQ Project Reporter: Charles Crouch <ccrouch>
Component: No ComponentAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: unspecifiedCC: ccrouch, hbrock, jshaughn
Target Milestone: ---Keywords: FutureFeature, Task
Target Release: ---   
Hardware: All   
OS: All   
URL: http://jira.rhq-project.org/browse/RHQ-2346
Whiteboard:
Fixed In Version: 1.4 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-09 15:42:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Charles Crouch 2009-08-13 15:00:00 UTC
Options:
-Delay the compression so it only runs on the hour
-Execute the compression before Quartz starts up so that the compression job has the database to itself.

Comment 1 John Mazzitelli 2009-08-13 15:02:11 UTC
StartupServlet should execute the compression job before anything else

Comment 2 John Mazzitelli 2009-08-13 15:06:26 UTC
need to be careful to make sure no more than a single server is doing this compression - quartz does that but we can't rely on quartz to trigger this initial job - because quartz upon startup will probably already hvae a compression job and will trigger it - or anotehr server running may be doing compression.

Comment 3 John Mazzitelli 2009-08-13 15:26:53 UTC
rhq-server.properties can have a setting:

rhq.server.run-purge-job=true

if set to true, startup servlet will do this prior to starting quartz.

unless you use quartz to do it...

Comment 4 John Mazzitelli 2009-08-13 15:28:27 UTC
have a separate "startup data purge job" - it is a stateful job. it has a known job name that all servers know.

at startup:

1) start quartz in "paused" mode - all jobs must not trigger at quartz startup
2) pause all jobs
3) schedule stateful "startup purge job" for a single trigger and to trigger NOW
4) wait for that job to complete
5) unpause the other jobs

Comment 5 Red Hat Bugzilla 2009-11-10 21:02:38 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2346


Comment 6 wes hayutin 2010-02-16 15:44:46 UTC
mass move off the qa triage list.  These are tasks for dev.

Comment 7 Joseph Marques 2010-11-17 17:05:48 UTC
As it stands today, DataPurgeJob *is* a quartz job, so we can't execute it before quartz starts up - quartz needs to be started to run the job.  We could fix that by refactoring the bulk of job down into some SLSB method, and then having quartz call that method.

Keep in mind, DataPurgeJob is already a stateful job, which means that quartz will *ensure* that only one server is ever running that job at a time.  The only reason we're seeing a DataPurgeJob run at startup, is because a previous schedule was missed.  If quartz sees that the job should have been executed at t=60, and it's currently t=63 when the server starts, it means that it missed a trigger and will fire it immediately. 

Have we seen issues with the current implementation in production?

Comment 8 Charles Crouch 2010-11-18 22:41:43 UTC
IIRC the issues were around servers being down for a long time, having to do a large compression job and deal with many spooled agent reports at the same time. This bug would mitigate this, at least to a certain extent, i.e. as long as you don't start up at the top of hour. The workaround is to stand up the servers in maintenance mode and let the quartz job run then switch to normal mode and let the agents in. I don't see huge value in this, so its not something we should be spending a lot of time I think.