Bug 730993 - only purge unused drift files that are older than a certain time
only purge unused drift files that are older than a certain time
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: drift (Show other bugs)
4.0.1
Unspecified Unspecified
medium Severity medium (vote)
: ---
: JON 3.0.0,RHQ 4.3.0
Assigned To: John Mazzitelli
Mike Foley
:
Depends On:
Blocks: 707225
  Show dependency treegraph
 
Reported: 2011-08-16 09:49 EDT by John Mazzitelli
Modified: 2012-02-07 14:21 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-02-07 14:21:58 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
730993.diff (24.95 KB, patch)
2011-08-16 13:26 EDT, John Mazzitelli
no flags Details | Diff

  None (edit)
Description John Mazzitelli 2011-08-16 09:49:57 EDT
I checked in code that adds to the data purge job that runs hourly. We now purge drift files if there are no drifts referencing it. Every hour, when you see the data purge job emit its log messages, if you look in there, you will see messages about purging drift files.

This means we can clean up and reclaim space for drift files that are no longer used (that is, referenced as either an old or new file from any drift entry).

This goes through the drift server plugin - if we are using the RHQ DB backend, we'll purge unused rows in RHQ_DRIFT_FILE. I left a TODO in the MongoDB plugin to do the purging when using MongoDB as the drift backend.

Jay then suggested the following:

"one caveat here that we may be able to take care of with a simple flag or expiration data or something.  We actually may want drift files that are not (yet) associated with drifts.  This is the whole idea behind seeding the db with files we expect to be reported from agents. For example, we know we're going to deploy bundle Foo to 100 machines. We may very well want to slurp that bundle into the drift backend and create drift files in advance, so that we never actually need to download them from an agent.  They'll already be there."

To support that, we should add a system setting like the other purge ones - the ones like "purge alerts older than X days" or "purge events older than X days. For example, "purge orphaned (or unused) drift files older than X days".

We can add a AND clause in the DELETE SQL (see JPADriftFile):

DELETE FROM RHQ_DRIFT_FILE
    WHERE (HASH_ID NOT IN (SELECT OLD_DRIFT_FILE FROM RHQ_DRIFT))
      AND (HASH_ID NOT IN (SELECT NEW_DRIFT_FILE FROM RHQ_DRIFT))
      AND CTIME < ?

where ? is bound to some value in the past (epoch millis) that 
corresponds to how old a unused drift file is allowed to be without 
getting purged.

We'd need something similar in any drift server plugin impl (like the mongo plugin).
Comment 1 John Mazzitelli 2011-08-16 13:26:18 EDT
Created attachment 518539 [details]
730993.diff

i implemented this locally. in case we want to have a global setting to say how old drift files must be before they can be purged, see the attached patch.
Comment 2 John Mazzitelli 2011-11-01 15:30:20 EDT
this was previously committed on August 16, 2011
 - commit 5bdcd83af54cd10ff4b78bfba66db7330a11abd5
Comment 3 John Mazzitelli 2011-11-04 16:59:28 EDT
to test

1. first, get some drift 
2. confirm "select id from rhq_drift_file" - make sure you see the rows in there
   (you'll use this later)
3. I think you will have to then uninventory the resource that the drift was on
4. select id from rhq_drift_file - make sure the drifts are still there

you then wait for the time to expire (the time defined in the system settings)
after that time, "select id from rhq_drift_file" should show you the rows are gone.
Comment 4 Mike Foley 2011-11-04 17:22:58 EDT
rows are gone.  verified.
Comment 5 John Mazzitelli 2011-11-04 18:44:57 EDT
> 2. confirm "select id from rhq_drift_file" - make sure you see the rows in
> there (you'll use this later)

just to correct this - "id" isn't a valid column - "hash_id" would work
Comment 6 Mike Foley 2012-02-07 14:21:58 EST
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.