Bug 1589865 - RFE: Reject a vmcore submission before download by comparing a remote md5sum with other tasks in the system
Summary: RFE: Reject a vmcore submission before download by comparing a remote md5sum ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: epel7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: abrt
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-11 14:48 UTC by Dave Wysochanski
Modified: 2023-08-08 14:09 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-08 14:09:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dave Wysochanski 2018-06-11 14:48:22 UTC
Description of problem:
This is an RFE as a follow-on to https://bugzilla.redhat.com/show_bug.cgi?id=1558903

Often someone will submit a duplicate vmcore not knowing that an existing vmcore is being submitted.  Today unfortunately we cannot know if this is a duplicate   until we download it and run md5sum on it and compare with existing tasks.

This RFE would reject a task before we download it, but it requires we can obtain an md5sum for the remote file before we download it.  This does not exist to day but may exist in the future if the download methods are expanded.

Version-Release number of selected component (if applicable):
upstream bca08b5fab8c018fb665b014687d1755525ae1db


How reproducible:
everytime

Steps to Reproduce:
1. Submit a vmcore to retrace server
2. Submit the identical file (either the same filename or different)

Actual results:
retrace-server does not reject the task

Expected results:
Either one of the following:
1. A warning and requires an '--force' option to submit a duplicate
2. Rejects any submission of a file with a duplicate md5sum

Additional info:
This is still a problem for us as today even I see likely duplicate 150GB vmcore being downloaded.  The dual download alone will kill performance for other smaller vmcores so it's a non-trivial impact.

There may be legitimate reasons to re-submit a vmcore.  For example, we run regression tests by submitting a set of vmcores to retrace-server, and we can compare previous results.  Also today I am not sure all steps are re-run if a vmcore re-submitted and I've seen instances where "retrace-server-worker --restart" triggered an error.  I suppose we could delete the old vmcores in these cases but it may not be ideal.  In any case, it requires some thought whether you can always reject outright a duplicate task based on md5sum or whether there are legitimate exceptions where you'd want a "--force" option.

Maybe we should implement the '--force' option today just based on the filename and filesize and then later consider adding md5sum?

Comment 1 Dave Wysochanski 2018-06-16 22:25:27 UTC
Depending on when other download interfaces arrive that may provide md5sum remotely, we may want to split this off to not require md5sum before rejecting a vmcore or requiring a "--force" flag if:
- vmcore name is the same
- vmcore size is the same
as an existing (or in-progress) vmcore

Unfortunately we are still getting huge vmcores submitted multiple times by unsuspecting parties, which kills download bandwidth, etc.

Comment 2 Dave Wysochanski 2023-08-08 14:09:56 UTC
Closing this WONTFIX due to lack of bandwidth and unclear upstream project status going forward.


Note You need to log in before you can comment on or make changes to this bug.