Bug 535675 (RHQ-2346)

Summary:	Make compression job not happen immediately after startup
Product:	[Other] RHQ Project	Reporter:	Charles Crouch <ccrouch>
Component:	No Component	Assignee:	RHQ Project Maintainer <rhq-maint>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	medium	Docs Contact:
Priority:	high
Version:	unspecified	CC:	ccrouch, hbrock, jshaughn
Target Milestone:	---	Keywords:	FutureFeature, Task
Target Release:	---
Hardware:	All
OS:	All
URL:	http://jira.rhq-project.org/browse/RHQ-2346
Whiteboard:
Fixed In Version:	1.4	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-05-09 15:42:19 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Charles Crouch 2009-08-13 15:00:00 UTC

Options:
-Delay the compression so it only runs on the hour
-Execute the compression before Quartz starts up so that the compression job has the database to itself.

Comment 1 John Mazzitelli 2009-08-13 15:02:11 UTC

StartupServlet should execute the compression job before anything else

Comment 2 John Mazzitelli 2009-08-13 15:06:26 UTC

need to be careful to make sure no more than a single server is doing this compression - quartz does that but we can't rely on quartz to trigger this initial job - because quartz upon startup will probably already hvae a compression job and will trigger it - or anotehr server running may be doing compression.

Comment 3 John Mazzitelli 2009-08-13 15:26:53 UTC

rhq-server.properties can have a setting:

rhq.server.run-purge-job=true

if set to true, startup servlet will do this prior to starting quartz.

unless you use quartz to do it...

Comment 4 John Mazzitelli 2009-08-13 15:28:27 UTC

have a separate "startup data purge job" - it is a stateful job. it has a known job name that all servers know.

at startup:

1) start quartz in "paused" mode - all jobs must not trigger at quartz startup
2) pause all jobs
3) schedule stateful "startup purge job" for a single trigger and to trigger NOW
4) wait for that job to complete
5) unpause the other jobs

Comment 5 Red Hat Bugzilla 2009-11-10 21:02:38 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2346

Comment 6 wes hayutin 2010-02-16 15:44:46 UTC

mass move off the qa triage list.  These are tasks for dev.

Comment 7 Joseph Marques 2010-11-17 17:05:48 UTC

As it stands today, DataPurgeJob *is* a quartz job, so we can't execute it before quartz starts up - quartz needs to be started to run the job.  We could fix that by refactoring the bulk of job down into some SLSB method, and then having quartz call that method.

Keep in mind, DataPurgeJob is already a stateful job, which means that quartz will *ensure* that only one server is ever running that job at a time.  The only reason we're seeing a DataPurgeJob run at startup, is because a previous schedule was missed.  If quartz sees that the job should have been executed at t=60, and it's currently t=63 when the server starts, it means that it missed a trigger and will fire it immediately. 

Have we seen issues with the current implementation in production?

Comment 8 Charles Crouch 2010-11-18 22:41:43 UTC

IIRC the issues were around servers being down for a long time, having to do a large compression job and deal with many spooled agent reports at the same time. This bug would mitigate this, at least to a certain extent, i.e. as long as you don't start up at the top of hour. The workaround is to stand up the servers in maintenance mode and let the quartz job run then switch to normal mode and let the agents in. I don't see huge value in this, so its not something we should be spending a lot of time I think.