998271 – Add retrace-server-requeue-task command or something like to to requeue a failed task

Bug 998271 - Add retrace-server-requeue-task command or something like to to requeue a failed task

Summary: Add retrace-server-requeue-task command or something like to to requeue a fai...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora EPEL
Classification:	Fedora
Component:	retrace-server
Sub Component:
Version:	el6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Michal Toman
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-18 19:35 UTC by Dave Wysochanski
Modified:	2015-03-23 00:42 UTC (History)
CC List:	4 users (show)
Fixed In Version:	retrace-server-1.10-1.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-08-22 18:25:39 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
Sample shell script which implements the current requeue task steps (659 bytes, application/x-shellscript) 2013-08-18 19:35 UTC, Dave Wysochanski	no flags	Details
View All

Description Dave Wysochanski 2013-08-18 19:35:25 UTC

Created attachment 787821 [details]
Sample shell script which implements the current requeue task steps

Description of problem:
If a task fails, today we have the following workaround instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Requeueing_Failed_Tasks

"If a Task fails before it complete:
    The failed task would have created the required directory structure.
1. Download the vmcore and extract it to the crash/ sub-directory of the task.
2. Delete the status, started_time, finished_time, and progress files in the task directory. Also delete the files in the misc sub-directory.
3. retrace-server-worker <task_id>
4.   Make sure the owner/group of the files and directories are set to values that allow others to access the Task."

It would be good to put this into a 'requeue' command so people don't start messing with files in the directory or get the requeue steps wrong.  I created a shell script which seems to do the above:

$ cat ~/bin/retrace-server-task-requeue 
#!/bin/sh
# Automate some steps to requeue a retrace vmcore once it's placed into <retrace>/tasks/<taskid>/crash
# https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions
#
ARGS=1
E_BADARGS=65
if [ $# -ne "$ARGS" ]; then
        echo "Usage: `basename $0` [retrace-task-id]"
        exit $E_BADARGS
fi
TASK=$1
RETRACEDIR=$(awk '/SaveDir =/ { print $3 }' /etc/retrace-server.conf)
if [ ! -d $RETRACEDIR/$TASK ]; then
        echo "Retrace task $TASK does not exist in $RETRACEDIR"
        exit $E_BADARGS
fi
for f in status started_time finished_time progress; do rm -f $RETRACEDIR/$TASK/$f; done
rm -rf $RETRACEDIR/$TASK/misc/*
retrace-server-worker $TASK



Version-Release number of selected component (if applicable):
retrace-server-1.9-6.el6.noarch


How reproducible:
Everytime if you have a core which fails.


Steps to Reproduce:
Queue a vmcore which fails (for example, maybe a split vmcore?).



Actual results:
You have to follow these manual instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions


Expected results:
Be able to run something like this:
$ retrace-server-requeue <taskid>


Additional info:
If possible, we may be able to improve the handling of split vmcores as well.  as I recall though, there's a problem with someone replacing the file in 'crash/vmcore' with a new one.

Comment 2 Michal Toman 2013-08-20 13:34:16 UTC

Fixed in upstream. Just added --restart option to retrace-server-worker so that the restart needs to be explicitely mentioned.

commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
Author: Michal Toman <mtoman>
Date:   Tue Aug 20 15:32:43 2013 +0200

    worker: add --restart option

Comment 3 Dave Wysochanski 2013-08-20 15:21:14 UTC

(In reply to Michal Toman from comment #2)
> Fixed in upstream. Just added --restart option to retrace-server-worker so
> that the restart needs to be explicitely mentioned.
> 
> commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
> Author: Michal Toman <mtoman>
> Date:   Tue Aug 20 15:32:43 2013 +0200
> 
>     worker: add --restart option

Thanks!

Comment 4 Fedora Update System 2013-08-21 11:37:23 UTC

retrace-server-1.10-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.10-1.el6

Comment 5 Fedora Update System 2013-08-21 19:01:29 UTC

Package retrace-server-1.10-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.10-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11280/retrace-server-1.10-1.el6
then log in and leave karma (feedback).

Comment 6 Fedora Update System 2013-08-22 18:25:39 UTC

retrace-server-1.10-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.