Bug 1428040 - RFE: De-duplication of vmcores - add "AllowDuplicates" option in /etc/retrace-server.conf or de-dedup cleanup job
Summary: RFE: De-duplication of vmcores - add "AllowDuplicates" option in /etc/retrace...
Keywords:
Status: CLOSED DUPLICATE of bug 1558903
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: epel7
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
Assignee: Dave Wysochanski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1128972
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-01 18:38 UTC by Dave Wysochanski
Modified: 2018-04-10 14:23 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-10 14:23:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dave Wysochanski 2017-03-01 18:38:19 UTC
Description of problem:
In our production system for various reasons a vmcore can get submitted multiple times.  If the task is non-local file (i.e. ftp) this will take up unnecessary space and lead to confusion.  In particular, this may happen with damaged or larger vmcores.  In the case of larger vmcores it can quickly consume a ton of space.  In my experience many of the out of space conditions we've seen in production are the result of very large vmcores and often duplicate vmcores.

This RFE requests some sort of function to avoid duplicates.  The function depends on knowing something is a duplicate, so it depends on implementation of the md5sum bug https://bugzilla.redhat.com/show_bug.cgi?id=1128972 or something similar.

There are at least two approaches to solve the problem of duplicates:
Option 1: Implement logic to check for duplicates at submission time, and automatically remove any task which will end up being a duplicate.  This option has the advantage that there is no cleanup later, and the user submitting the vmcore can be redirected to the existing vmcore and they do not have to wait for the new task.

Option 2: Implement a cleanup job that will periodically scan for duplicates, and remove them.  The downside of this approach is that multiple engineers may not be aware there is a duplicate, and so they may have their task removed.  If any files exist in 'misc' directory this might end up in lost work for an engineer.  Even still this does seem feasible and the cleanup job may email the submitter of the task or anyone with a file with ownership in 'misc' directory.

I probably would lean at "Option 1" since it is probably better to avoid creating duplicates if you can.

In any case of the implementation, I think we need:

1. An "AllowDuplicates" option in /etc/retrace-server.conf.  This should default to 'N' so the cleanup works by default.

2. Some database to track tasks by md5sum.
a) When a task is submitted, the md5sum is run and a lookup is done on the database.  If a hit is found, the taskid stored in the database can be given to the user in the notification of his deleted task.  "Task XYZ contained file foo.tar which is a duplicate of taskid ABC.  Task XYZ has been cancelled and deleted.  Please use taskid ABC for your analysis."  If no hit to the database, a record is added to the database.
b) When a task is deleted, its record in the database will need to be deleted.

Version-Release number of selected component (if applicable):
retrace-server-1.16-1.el6.noarch

How reproducible:
Everytime

Steps to Reproduce:
1. Submit a vmcore to retrace-server
2. Submit the same vmcore a second time

Actual results:
Duplicate task which unnecessarily takes up disk space

Expected results:
No duplicate task

Additional info:
There may be instances when duplicates are desired.  The one case I can think of off the top of my head is for testing - we submit a series of files / vmcores to ensure proper function of a new retrace-server build.  In this use case though we can easily change AllowDuplicates=Yes.

This is a lower priority item but does come up enough that it warrants a look, and probably the implementation is not very hard either.

Comment 1 Dave Wysochanski 2018-04-10 14:23:03 UTC

*** This bug has been marked as a duplicate of bug 1558903 ***


Note You need to log in before you can comment on or make changes to this bug.