730993 – only purge unused drift files that are older than a certain time

Bug 730993 - only purge unused drift files that are older than a certain time

Summary: only purge unused drift files that are older than a certain time

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	drift
Sub Component:
Version:	4.0.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	JON 3.0.0,RHQ 4.3.0
Assignee:	John Mazzitelli
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	707225
TreeView+	depends on / blocked

Reported:	2011-08-16 13:49 UTC by John Mazzitelli
Modified:	2012-02-07 19:21 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-02-07 19:21:58 UTC
Embargoed:

Attachments	(Terms of Use)
730993.diff (24.95 KB, patch) 2011-08-16 17:26 UTC, John Mazzitelli	no flags	Details \| Diff
View All

Description John Mazzitelli 2011-08-16 13:49:57 UTC

I checked in code that adds to the data purge job that runs hourly. We now purge drift files if there are no drifts referencing it. Every hour, when you see the data purge job emit its log messages, if you look in there, you will see messages about purging drift files.

This means we can clean up and reclaim space for drift files that are no longer used (that is, referenced as either an old or new file from any drift entry).

This goes through the drift server plugin - if we are using the RHQ DB backend, we'll purge unused rows in RHQ_DRIFT_FILE. I left a TODO in the MongoDB plugin to do the purging when using MongoDB as the drift backend.

Jay then suggested the following:

"one caveat here that we may be able to take care of with a simple flag or expiration data or something.  We actually may want drift files that are not (yet) associated with drifts.  This is the whole idea behind seeding the db with files we expect to be reported from agents. For example, we know we're going to deploy bundle Foo to 100 machines. We may very well want to slurp that bundle into the drift backend and create drift files in advance, so that we never actually need to download them from an agent.  They'll already be there."

To support that, we should add a system setting like the other purge ones - the ones like "purge alerts older than X days" or "purge events older than X days. For example, "purge orphaned (or unused) drift files older than X days".

We can add a AND clause in the DELETE SQL (see JPADriftFile):

DELETE FROM RHQ_DRIFT_FILE
    WHERE (HASH_ID NOT IN (SELECT OLD_DRIFT_FILE FROM RHQ_DRIFT))
      AND (HASH_ID NOT IN (SELECT NEW_DRIFT_FILE FROM RHQ_DRIFT))
      AND CTIME < ?

where ? is bound to some value in the past (epoch millis) that 
corresponds to how old a unused drift file is allowed to be without 
getting purged.

We'd need something similar in any drift server plugin impl (like the mongo plugin).

Comment 1 John Mazzitelli 2011-08-16 17:26:18 UTC

Created attachment 518539 [details]
730993.diff

i implemented this locally. in case we want to have a global setting to say how old drift files must be before they can be purged, see the attached patch.

Comment 2 John Mazzitelli 2011-11-01 19:30:20 UTC

this was previously committed on August 16, 2011
 - commit 5bdcd83af54cd10ff4b78bfba66db7330a11abd5

Comment 3 John Mazzitelli 2011-11-04 20:59:28 UTC

to test

1. first, get some drift 
2. confirm "select id from rhq_drift_file" - make sure you see the rows in there
   (you'll use this later)
3. I think you will have to then uninventory the resource that the drift was on
4. select id from rhq_drift_file - make sure the drifts are still there

you then wait for the time to expire (the time defined in the system settings)
after that time, "select id from rhq_drift_file" should show you the rows are gone.

Comment 4 Mike Foley 2011-11-04 21:22:58 UTC

rows are gone.  verified.

Comment 5 John Mazzitelli 2011-11-04 22:44:57 UTC

> 2. confirm "select id from rhq_drift_file" - make sure you see the rows in
> there (you'll use this later)

just to correct this - "id" isn't a valid column - "hash_id" would work

Comment 6 Mike Foley 2012-02-07 19:21:58 UTC

changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.