Description of problem: A couple vmcores we submitted to retrace-server ran into a crash bug where 'crash --osrelease' would spin: https://bugzilla.redhat.com/show_bug.cgi?id=1114088 We put in a ticket to kill these processes, and crash got killed (it took a couple tries), but eventually the retrace-server processes ended up stuck permanently in "Preparing environment for backtrace generation" This is a value of '1' for the 'status' file. To clean these up, we tried manually setting these to a failed status (i.e. echo 6 > /cores/retrace/tasks/<taskid>/status). Unfortunately this had the side-effect of taking down the web UI. Putting the 'status' value back to '1' brought back up the web UI. Version-Release number of selected component (if applicable): retrace-server-1.11-4.el6.noarch How reproducible: Unsure. Steps to Reproduce: NOTE: Other steps may be used as well to get the non-terminal task state. This is just one example and how we saw the bug. 1. Install crash subject bz 1114088 2. Submit a vmcore which would trigger crash spinning from bz 1114088 3. Kill the crash process (probably multiple times) 4. task ends up in a non-terminal state Actual results: retrace-server task ends up with 'status' file == 1, and permanently hung. Unable to clean it up by an administrator setting a 'failed' value into the status file. If an administrator sets the 'status' file manually, the web UI does not load. Expected results: Web UI always loads. If there's tasks stuck in a non-final state (something other than success or fail), there's some way to get them out of this state safely. Additional info: There used to be a command line option to force a state of a task, but it looks like it's been removed. Maybe we just need to delete such tasks manually? Is there some other way to cleanup? Also it does not seem like the web UI should be vulnerable to someone changing one task 'status' file like this.
So there are two problems 1. Missing set-success/set-fail commands 2. Missing finished_time bringing down the webui Both fixed in upstream commit d8168b6b540b3d46651af0d21075c8e6ba7f8b13 Author: Michal Toman <mtoman> Date: Wed Jul 30 09:38:17 2014 +0200 rs-interact: add 'set-success' and 'set-fail' actions Signed-off-by: Michal Toman <mtoman> commit dc055c6340b532f631014899989995d3d1842f11 Author: Michal Toman <mtoman> Date: Wed Jul 30 13:31:45 2014 +0200 rs-interact: set finish time if necessary Signed-off-by: Michal Toman <mtoman>
retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/retrace-server-1.12-2.el6
Package retrace-server-1.12-2.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing retrace-server-1.12-2.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-2089/retrace-server-1.12-2.el6 then log in and leave karma (feedback).
retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.