Bug 1589865

Summary: RFE: Reject a vmcore submission before download by comparing a remote md5sum with other tasks in the system
Product: [Fedora] Fedora EPEL Reporter: Dave Wysochanski <dwysocha>
Component: retrace-serverAssignee: abrt <abrt-devel-list>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: epel7CC: abrt-devel-list, dwysocha, jakub, michal.toman, mmarusak, msuchy
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-08 14:09:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Wysochanski 2018-06-11 14:48:22 UTC
Description of problem:
This is an RFE as a follow-on to https://bugzilla.redhat.com/show_bug.cgi?id=1558903

Often someone will submit a duplicate vmcore not knowing that an existing vmcore is being submitted.  Today unfortunately we cannot know if this is a duplicate   until we download it and run md5sum on it and compare with existing tasks.

This RFE would reject a task before we download it, but it requires we can obtain an md5sum for the remote file before we download it.  This does not exist to day but may exist in the future if the download methods are expanded.

Version-Release number of selected component (if applicable):
upstream bca08b5fab8c018fb665b014687d1755525ae1db


How reproducible:
everytime

Steps to Reproduce:
1. Submit a vmcore to retrace server
2. Submit the identical file (either the same filename or different)

Actual results:
retrace-server does not reject the task

Expected results:
Either one of the following:
1. A warning and requires an '--force' option to submit a duplicate
2. Rejects any submission of a file with a duplicate md5sum

Additional info:
This is still a problem for us as today even I see likely duplicate 150GB vmcore being downloaded.  The dual download alone will kill performance for other smaller vmcores so it's a non-trivial impact.

There may be legitimate reasons to re-submit a vmcore.  For example, we run regression tests by submitting a set of vmcores to retrace-server, and we can compare previous results.  Also today I am not sure all steps are re-run if a vmcore re-submitted and I've seen instances where "retrace-server-worker --restart" triggered an error.  I suppose we could delete the old vmcores in these cases but it may not be ideal.  In any case, it requires some thought whether you can always reject outright a duplicate task based on md5sum or whether there are legitimate exceptions where you'd want a "--force" option.

Maybe we should implement the '--force' option today just based on the filename and filesize and then later consider adding md5sum?

Comment 1 Dave Wysochanski 2018-06-16 22:25:27 UTC
Depending on when other download interfaces arrive that may provide md5sum remotely, we may want to split this off to not require md5sum before rejecting a vmcore or requiring a "--force" flag if:
- vmcore name is the same
- vmcore size is the same
as an existing (or in-progress) vmcore

Unfortunately we are still getting huge vmcores submitted multiple times by unsuspecting parties, which kills download bandwidth, etc.

Comment 2 Dave Wysochanski 2023-08-08 14:09:56 UTC
Closing this WONTFIX due to lack of bandwidth and unclear upstream project status going forward.