Bug 730993 - only purge unused drift files that are older than a certain time
Summary: only purge unused drift files that are older than a certain time
Alias: None
Product: RHQ Project
Classification: Other
Component: drift
Version: 4.0.1
Hardware: Unspecified
OS: Unspecified
medium vote
Target Milestone: ---
: JON 3.0.0,RHQ 4.3.0
Assignee: John Mazzitelli
QA Contact: Mike Foley
Depends On:
Blocks: 707225
TreeView+ depends on / blocked
Reported: 2011-08-16 13:49 UTC by John Mazzitelli
Modified: 2012-02-07 19:21 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2012-02-07 19:21:58 UTC

Attachments (Terms of Use)
730993.diff (24.95 KB, patch)
2011-08-16 17:26 UTC, John Mazzitelli
no flags Details | Diff

Description John Mazzitelli 2011-08-16 13:49:57 UTC
I checked in code that adds to the data purge job that runs hourly. We now purge drift files if there are no drifts referencing it. Every hour, when you see the data purge job emit its log messages, if you look in there, you will see messages about purging drift files.

This means we can clean up and reclaim space for drift files that are no longer used (that is, referenced as either an old or new file from any drift entry).

This goes through the drift server plugin - if we are using the RHQ DB backend, we'll purge unused rows in RHQ_DRIFT_FILE. I left a TODO in the MongoDB plugin to do the purging when using MongoDB as the drift backend.

Jay then suggested the following:

"one caveat here that we may be able to take care of with a simple flag or expiration data or something.  We actually may want drift files that are not (yet) associated with drifts.  This is the whole idea behind seeding the db with files we expect to be reported from agents. For example, we know we're going to deploy bundle Foo to 100 machines. We may very well want to slurp that bundle into the drift backend and create drift files in advance, so that we never actually need to download them from an agent.  They'll already be there."

To support that, we should add a system setting like the other purge ones - the ones like "purge alerts older than X days" or "purge events older than X days. For example, "purge orphaned (or unused) drift files older than X days".

We can add a AND clause in the DELETE SQL (see JPADriftFile):

      AND CTIME < ?

where ? is bound to some value in the past (epoch millis) that 
corresponds to how old a unused drift file is allowed to be without 
getting purged.

We'd need something similar in any drift server plugin impl (like the mongo plugin).

Comment 1 John Mazzitelli 2011-08-16 17:26:18 UTC
Created attachment 518539 [details]

i implemented this locally. in case we want to have a global setting to say how old drift files must be before they can be purged, see the attached patch.

Comment 2 John Mazzitelli 2011-11-01 19:30:20 UTC
this was previously committed on August 16, 2011
 - commit 5bdcd83af54cd10ff4b78bfba66db7330a11abd5

Comment 3 John Mazzitelli 2011-11-04 20:59:28 UTC
to test

1. first, get some drift 
2. confirm "select id from rhq_drift_file" - make sure you see the rows in there
   (you'll use this later)
3. I think you will have to then uninventory the resource that the drift was on
4. select id from rhq_drift_file - make sure the drifts are still there

you then wait for the time to expire (the time defined in the system settings)
after that time, "select id from rhq_drift_file" should show you the rows are gone.

Comment 4 Mike Foley 2011-11-04 21:22:58 UTC
rows are gone.  verified.

Comment 5 John Mazzitelli 2011-11-04 22:44:57 UTC
> 2. confirm "select id from rhq_drift_file" - make sure you see the rows in
> there (you'll use this later)

just to correct this - "id" isn't a valid column - "hash_id" would work

Comment 6 Mike Foley 2012-02-07 19:21:58 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.