Red Hat Bugzilla – Bug 1253908
retrace-server cleanup job should not remove tasks with open vmcores
Last modified: 2018-02-05 13:47:23 EST
Description of problem:
Today retrace server just checks the mtime of the task directory and removes the task if it is too old (a failed task is removed with different timeframe than a success task). We need another check to make sure the vmcore (crash/vmcore) is not open before removal. Any instance of this occurring may be a case where people have forgotten to close their crash session, but could also be due to a 'failed task' but somehow the vmcore is usable. The latter is a real possibility, and we have at least one open bug about it today - https://bugzilla.redhat.com/show_bug.cgi?id=1149356). In any case, retrace should not remove tasks if it's associated with an open vmcore.
Version-Release number of selected component (if applicable):
Easily reproducible but in practice on our system I've only seen it a couple times (likely due to our DeleteFailedTaskAfter or DeleteTaskAfter settings), and I can't recall anyone complaining recently though some have brought it up.
Steps to Reproduce:
1. Run 'retrace-server-worker <task> crash' on a vmcore and leave it open for longer than the delete rules (either DeleteFailedTaskAfter or DeleteTaskAfter)
task is removed even though someone has the 'crash/vmcore' file open
task is not removed if someone has the 'crash/vmcore' file open
Documented, but low priority for us due to fairly large value of DeleteTaskAfter (we have 6 mo right now), and low incidence. If we ever lower the values of DeleteFailedTaskAfter or DeleteTaskAfter, or there's a higher rate of 'failed tasks' which are still useful, it may become more of an issue.