Bug 998271 - Add retrace-server-requeue-task command or something like to to requeue a failed task
Add retrace-server-requeue-task command or something like to to requeue a fai...
Status: CLOSED ERRATA
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server (Show other bugs)
el6
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Michal Toman
Fedora Extras Quality Assurance
: Patch, TestCaseProvided
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-18 15:35 EDT by Dave Wysochanski
Modified: 2015-03-22 20:42 EDT (History)
4 users (show)

See Also:
Fixed In Version: retrace-server-1.10-1.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-22 14:25:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Sample shell script which implements the current requeue task steps (659 bytes, application/x-shellscript)
2013-08-18 15:35 EDT, Dave Wysochanski
no flags Details

  None (edit)
Description Dave Wysochanski 2013-08-18 15:35:25 EDT
Created attachment 787821 [details]
Sample shell script which implements the current requeue task steps

Description of problem:
If a task fails, today we have the following workaround instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Requeueing_Failed_Tasks

"If a Task fails before it complete:
    The failed task would have created the required directory structure.
1. Download the vmcore and extract it to the crash/ sub-directory of the task.
2. Delete the status, started_time, finished_time, and progress files in the task directory. Also delete the files in the misc sub-directory.
3. retrace-server-worker <task_id>
4.   Make sure the owner/group of the files and directories are set to values that allow others to access the Task."

It would be good to put this into a 'requeue' command so people don't start messing with files in the directory or get the requeue steps wrong.  I created a shell script which seems to do the above:

$ cat ~/bin/retrace-server-task-requeue 
#!/bin/sh
# Automate some steps to requeue a retrace vmcore once it's placed into <retrace>/tasks/<taskid>/crash
# https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions
#
ARGS=1
E_BADARGS=65
if [ $# -ne "$ARGS" ]; then
        echo "Usage: `basename $0` [retrace-task-id]"
        exit $E_BADARGS
fi
TASK=$1
RETRACEDIR=$(awk '/SaveDir =/ { print $3 }' /etc/retrace-server.conf)
if [ ! -d $RETRACEDIR/$TASK ]; then
        echo "Retrace task $TASK does not exist in $RETRACEDIR"
        exit $E_BADARGS
fi
for f in status started_time finished_time progress; do rm -f $RETRACEDIR/$TASK/$f; done
rm -rf $RETRACEDIR/$TASK/misc/*
retrace-server-worker $TASK



Version-Release number of selected component (if applicable):
retrace-server-1.9-6.el6.noarch


How reproducible:
Everytime if you have a core which fails.


Steps to Reproduce:
Queue a vmcore which fails (for example, maybe a split vmcore?).



Actual results:
You have to follow these manual instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions


Expected results:
Be able to run something like this:
$ retrace-server-requeue <taskid>


Additional info:
If possible, we may be able to improve the handling of split vmcores as well.  as I recall though, there's a problem with someone replacing the file in 'crash/vmcore' with a new one.
Comment 2 Michal Toman 2013-08-20 09:34:16 EDT
Fixed in upstream. Just added --restart option to retrace-server-worker so that the restart needs to be explicitely mentioned.

commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
Author: Michal Toman <mtoman@redhat.com>
Date:   Tue Aug 20 15:32:43 2013 +0200

    worker: add --restart option
Comment 3 Dave Wysochanski 2013-08-20 11:21:14 EDT
(In reply to Michal Toman from comment #2)
> Fixed in upstream. Just added --restart option to retrace-server-worker so
> that the restart needs to be explicitely mentioned.
> 
> commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
> Author: Michal Toman <mtoman@redhat.com>
> Date:   Tue Aug 20 15:32:43 2013 +0200
> 
>     worker: add --restart option

Thanks!
Comment 4 Fedora Update System 2013-08-21 07:37:23 EDT
retrace-server-1.10-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.10-1.el6
Comment 5 Fedora Update System 2013-08-21 15:01:29 EDT
Package retrace-server-1.10-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.10-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11280/retrace-server-1.10-1.el6
then log in and leave karma (feedback).
Comment 6 Fedora Update System 2013-08-22 14:25:39 EDT
retrace-server-1.10-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.