I checked in code that adds to the data purge job that runs hourly. We now purge drift files if there are no drifts referencing it. Every hour, when you see the data purge job emit its log messages, if you look in there, you will see messages about purging drift files.
This means we can clean up and reclaim space for drift files that are no longer used (that is, referenced as either an old or new file from any drift entry).
This goes through the drift server plugin - if we are using the RHQ DB backend, we'll purge unused rows in RHQ_DRIFT_FILE. I left a TODO in the MongoDB plugin to do the purging when using MongoDB as the drift backend.
Jay then suggested the following:
"one caveat here that we may be able to take care of with a simple flag or expiration data or something. We actually may want drift files that are not (yet) associated with drifts. This is the whole idea behind seeding the db with files we expect to be reported from agents. For example, we know we're going to deploy bundle Foo to 100 machines. We may very well want to slurp that bundle into the drift backend and create drift files in advance, so that we never actually need to download them from an agent. They'll already be there."
To support that, we should add a system setting like the other purge ones - the ones like "purge alerts older than X days" or "purge events older than X days. For example, "purge orphaned (or unused) drift files older than X days".
We can add a AND clause in the DELETE SQL (see JPADriftFile):
DELETE FROM RHQ_DRIFT_FILE
WHERE (HASH_ID NOT IN (SELECT OLD_DRIFT_FILE FROM RHQ_DRIFT))
AND (HASH_ID NOT IN (SELECT NEW_DRIFT_FILE FROM RHQ_DRIFT))
AND CTIME < ?
where ? is bound to some value in the past (epoch millis) that
corresponds to how old a unused drift file is allowed to be without
We'd need something similar in any drift server plugin impl (like the mongo plugin).
Created attachment 518539 [details]
i implemented this locally. in case we want to have a global setting to say how old drift files must be before they can be purged, see the attached patch.
this was previously committed on August 16, 2011
- commit 5bdcd83af54cd10ff4b78bfba66db7330a11abd5
1. first, get some drift
2. confirm "select id from rhq_drift_file" - make sure you see the rows in there
(you'll use this later)
3. I think you will have to then uninventory the resource that the drift was on
4. select id from rhq_drift_file - make sure the drifts are still there
you then wait for the time to expire (the time defined in the system settings)
after that time, "select id from rhq_drift_file" should show you the rows are gone.
rows are gone. verified.
> 2. confirm "select id from rhq_drift_file" - make sure you see the rows in
> there (you'll use this later)
just to correct this - "id" isn't a valid column - "hash_id" would work
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE