Bug 1124477 - permanent non-final task state possible, forcing fail value into 'status' file crashes web UI
Summary: permanent non-final task state possible, forcing fail value into 'status' fil...
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: epel7
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Michal Toman
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2014-07-29 14:49 UTC by Dave Wysochanski
Modified: 2015-03-23 00:42 UTC (History)
2 users (show)

Fixed In Version: retrace-server-1.12-2.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2014-08-15 18:57:55 UTC
Type: Bug

Attachments (Terms of Use)

Description Dave Wysochanski 2014-07-29 14:49:53 UTC
Description of problem:
A couple vmcores we submitted to retrace-server ran into a crash bug where 'crash --osrelease' would spin:

We put in a ticket to kill these processes, and crash got killed (it took a couple tries), but eventually the retrace-server processes ended up stuck permanently in "Preparing environment for backtrace generation"
This is a value of '1' for the 'status' file.

To clean these up, we tried manually setting these to a failed status (i.e. echo 6 > /cores/retrace/tasks/<taskid>/status).  Unfortunately this had the side-effect of taking down the web UI.  Putting the 'status' value back to '1' brought back up the web UI.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
NOTE: Other steps may be used as well to get the non-terminal task state.  This is just one example and how we saw the bug.
1. Install crash subject bz 1114088
2. Submit a vmcore which would trigger crash spinning from bz 1114088
3. Kill the crash process (probably multiple times)
4. task ends up in a non-terminal state

Actual results:
retrace-server task ends up with 'status' file == 1, and permanently hung.  Unable to clean it up by an administrator setting a 'failed' value into the status file.  If an administrator sets the 'status' file manually, the web UI does not load.

Expected results:
Web UI always loads.  If there's tasks stuck in a non-final state (something other than success or fail), there's some way to get them out of this state safely.

Additional info:
There used to be a command line option to force a state of a task, but it looks like it's been removed.  Maybe we just need to delete such tasks manually?  Is there some other way to cleanup?  Also it does not seem like the web UI should be vulnerable to someone changing one task 'status' file like this.

Comment 2 Michal Toman 2014-07-30 11:34:19 UTC
So there are two problems
1. Missing set-success/set-fail commands
2. Missing finished_time bringing down the webui
Both fixed in upstream

commit d8168b6b540b3d46651af0d21075c8e6ba7f8b13
Author: Michal Toman <mtoman>
Date:   Wed Jul 30 09:38:17 2014 +0200

    rs-interact: add 'set-success' and 'set-fail' actions
    Signed-off-by: Michal Toman <mtoman>

commit dc055c6340b532f631014899989995d3d1842f11
Author: Michal Toman <mtoman>
Date:   Wed Jul 30 13:31:45 2014 +0200

    rs-interact: set finish time if necessary
    Signed-off-by: Michal Toman <mtoman>

Comment 3 Fedora Update System 2014-07-31 11:52:40 UTC
retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6.

Comment 4 Fedora Update System 2014-07-31 16:58:54 UTC
Package retrace-server-1.12-2.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.12-2.el6'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).

Comment 5 Fedora Update System 2014-08-15 18:57:55 UTC
retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.