Bug 998271

Summary: Add retrace-server-requeue-task command or something like to to requeue a failed task
Product: [Fedora] Fedora EPEL Reporter: Dave Wysochanski <dwysocha>
Component: retrace-serverAssignee: Michal Toman <mtoman>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: el6CC: jmoskovc, mtoman, pknirsch, rvokal
Target Milestone: ---Keywords: Patch, TestCaseProvided
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: retrace-server-1.10-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-22 18:25:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Sample shell script which implements the current requeue task steps none

Description Dave Wysochanski 2013-08-18 19:35:25 UTC
Created attachment 787821 [details]
Sample shell script which implements the current requeue task steps

Description of problem:
If a task fails, today we have the following workaround instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Requeueing_Failed_Tasks

"If a Task fails before it complete:
    The failed task would have created the required directory structure.
1. Download the vmcore and extract it to the crash/ sub-directory of the task.
2. Delete the status, started_time, finished_time, and progress files in the task directory. Also delete the files in the misc sub-directory.
3. retrace-server-worker <task_id>
4.   Make sure the owner/group of the files and directories are set to values that allow others to access the Task."

It would be good to put this into a 'requeue' command so people don't start messing with files in the directory or get the requeue steps wrong.  I created a shell script which seems to do the above:

$ cat ~/bin/retrace-server-task-requeue 
#!/bin/sh
# Automate some steps to requeue a retrace vmcore once it's placed into <retrace>/tasks/<taskid>/crash
# https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions
#
ARGS=1
E_BADARGS=65
if [ $# -ne "$ARGS" ]; then
        echo "Usage: `basename $0` [retrace-task-id]"
        exit $E_BADARGS
fi
TASK=$1
RETRACEDIR=$(awk '/SaveDir =/ { print $3 }' /etc/retrace-server.conf)
if [ ! -d $RETRACEDIR/$TASK ]; then
        echo "Retrace task $TASK does not exist in $RETRACEDIR"
        exit $E_BADARGS
fi
for f in status started_time finished_time progress; do rm -f $RETRACEDIR/$TASK/$f; done
rm -rf $RETRACEDIR/$TASK/misc/*
retrace-server-worker $TASK



Version-Release number of selected component (if applicable):
retrace-server-1.9-6.el6.noarch


How reproducible:
Everytime if you have a core which fails.


Steps to Reproduce:
Queue a vmcore which fails (for example, maybe a split vmcore?).



Actual results:
You have to follow these manual instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions


Expected results:
Be able to run something like this:
$ retrace-server-requeue <taskid>


Additional info:
If possible, we may be able to improve the handling of split vmcores as well.  as I recall though, there's a problem with someone replacing the file in 'crash/vmcore' with a new one.

Comment 2 Michal Toman 2013-08-20 13:34:16 UTC
Fixed in upstream. Just added --restart option to retrace-server-worker so that the restart needs to be explicitely mentioned.

commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
Author: Michal Toman <mtoman>
Date:   Tue Aug 20 15:32:43 2013 +0200

    worker: add --restart option

Comment 3 Dave Wysochanski 2013-08-20 15:21:14 UTC
(In reply to Michal Toman from comment #2)
> Fixed in upstream. Just added --restart option to retrace-server-worker so
> that the restart needs to be explicitely mentioned.
> 
> commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
> Author: Michal Toman <mtoman>
> Date:   Tue Aug 20 15:32:43 2013 +0200
> 
>     worker: add --restart option

Thanks!

Comment 4 Fedora Update System 2013-08-21 11:37:23 UTC
retrace-server-1.10-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.10-1.el6

Comment 5 Fedora Update System 2013-08-21 19:01:29 UTC
Package retrace-server-1.10-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.10-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11280/retrace-server-1.10-1.el6
then log in and leave karma (feedback).

Comment 6 Fedora Update System 2013-08-22 18:25:39 UTC
retrace-server-1.10-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.