Bug 1253908 - retrace-server cleanup job should not remove tasks with open vmcores
Summary: retrace-server cleanup job should not remove tasks with open vmcores
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: epel7
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
Assignee: Martin Kutlak
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-15 11:25 UTC by Dave Wysochanski
Modified: 2019-03-21 22:49 UTC (History)
4 users (show)

Fixed In Version: retrace-server-1.18.1-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-21 22:49:49 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dave Wysochanski 2015-08-15 11:25:46 UTC
Description of problem:
Today retrace server just checks the mtime of the task directory and removes the task if it is too old (a failed task is removed with different timeframe than a success task).  We need another check to make sure the vmcore (crash/vmcore) is not open before removal.  Any instance of this occurring may be a case where people have forgotten to close their crash session, but could also be due to a 'failed task' but somehow the vmcore is usable.  The latter is a real possibility, and we have at least one open bug about it today -  https://bugzilla.redhat.com/show_bug.cgi?id=1149356).  In any case, retrace should not remove tasks if it's associated with an open vmcore.

Version-Release number of selected component (if applicable):
retrace-server-1.12-3.el6.noarch

How reproducible:
Easily reproducible but in practice on our system I've only seen it a couple times (likely due to our DeleteFailedTaskAfter or DeleteTaskAfter settings), and I can't recall anyone complaining recently though some have brought it up.

Steps to Reproduce:
1. Run 'retrace-server-worker <task> crash' on a vmcore and leave it open for longer than the delete rules (either DeleteFailedTaskAfter or DeleteTaskAfter)

Actual results:
task is removed even though someone has the 'crash/vmcore' file open

Expected results:
task is not removed if someone has the 'crash/vmcore' file open

Additional info:
Documented, but low priority for us due to fairly large value of DeleteTaskAfter (we have 6 mo right now), and low incidence.  If we ever lower the values of DeleteFailedTaskAfter or DeleteTaskAfter, or there's a higher rate of 'failed tasks' which are still useful, it may become more of an issue.

Comment 1 Dave Wysochanski 2019-02-07 18:41:37 UTC
I _think_ this may be easy to fix but haven't coded anything up.  Probably in the retrace-server-cleanup job, we just need to do the equivalent of 'lsof <path-to-core>' before we delete any task that seems to have an older task_age.  We can log a message if we skip the deletion due to some user having the task open.

Anyone have time for this?

Comment 2 Martin Kutlak 2019-02-08 12:50:25 UTC
I will try to come up with something to fix this bugzilla.

Comment 3 Martin Kutlak 2019-02-10 19:22:14 UTC
pull-request:

https://github.com/abrt/retrace-server/pull/224

Comment 4 Fedora Update System 2019-03-01 13:12:22 UTC
retrace-server-1.18.1-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-a354cc1c0f

Comment 5 Fedora Update System 2019-03-02 02:05:24 UTC
retrace-server-1.18.1-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-a354cc1c0f

Comment 6 Fedora Update System 2019-03-21 22:49:49 UTC
retrace-server-1.18.1-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.