Description of problem: Today retrace server just checks the mtime of the task directory and removes the task if it is too old (a failed task is removed with different timeframe than a success task). We need another check to make sure the vmcore (crash/vmcore) is not open before removal. Any instance of this occurring may be a case where people have forgotten to close their crash session, but could also be due to a 'failed task' but somehow the vmcore is usable. The latter is a real possibility, and we have at least one open bug about it today - https://bugzilla.redhat.com/show_bug.cgi?id=1149356). In any case, retrace should not remove tasks if it's associated with an open vmcore. Version-Release number of selected component (if applicable): retrace-server-1.12-3.el6.noarch How reproducible: Easily reproducible but in practice on our system I've only seen it a couple times (likely due to our DeleteFailedTaskAfter or DeleteTaskAfter settings), and I can't recall anyone complaining recently though some have brought it up. Steps to Reproduce: 1. Run 'retrace-server-worker <task> crash' on a vmcore and leave it open for longer than the delete rules (either DeleteFailedTaskAfter or DeleteTaskAfter) Actual results: task is removed even though someone has the 'crash/vmcore' file open Expected results: task is not removed if someone has the 'crash/vmcore' file open Additional info: Documented, but low priority for us due to fairly large value of DeleteTaskAfter (we have 6 mo right now), and low incidence. If we ever lower the values of DeleteFailedTaskAfter or DeleteTaskAfter, or there's a higher rate of 'failed tasks' which are still useful, it may become more of an issue.
I _think_ this may be easy to fix but haven't coded anything up. Probably in the retrace-server-cleanup job, we just need to do the equivalent of 'lsof <path-to-core>' before we delete any task that seems to have an older task_age. We can log a message if we skip the deletion due to some user having the task open. Anyone have time for this?
I will try to come up with something to fix this bugzilla.
pull-request: https://github.com/abrt/retrace-server/pull/224
retrace-server-1.18.1-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-a354cc1c0f
retrace-server-1.18.1-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-a354cc1c0f
retrace-server-1.18.1-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.