Bug 998271

Summary:

Add retrace-server-requeue-task command or something like to to requeue a failed task

Product:

[Fedora] Fedora EPEL

Reporter:

Dave Wysochanski <dwysocha>

Component:

retrace-server

Assignee:

Michal Toman <mtoman>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

el6

CC:

jmoskovc, mtoman, pknirsch, rvokal

Target Milestone:

---

Keywords:

Patch, TestCaseProvided

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

retrace-server-1.10-1.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-08-22 18:25:39 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Sample shell script which implements the current requeue task steps	none

Description Dave Wysochanski 2013-08-18 19:35:25 UTC

Created attachment 787821 [details]
Sample shell script which implements the current requeue task steps

Description of problem:
If a task fails, today we have the following workaround instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Requeueing_Failed_Tasks

"If a Task fails before it complete:
    The failed task would have created the required directory structure.
1. Download the vmcore and extract it to the crash/ sub-directory of the task.
2. Delete the status, started_time, finished_time, and progress files in the task directory. Also delete the files in the misc sub-directory.
3. retrace-server-worker <task_id>
4.   Make sure the owner/group of the files and directories are set to values that allow others to access the Task."

It would be good to put this into a 'requeue' command so people don't start messing with files in the directory or get the requeue steps wrong.  I created a shell script which seems to do the above:

$ cat ~/bin/retrace-server-task-requeue 
#!/bin/sh
# Automate some steps to requeue a retrace vmcore once it's placed into <retrace>/tasks/<taskid>/crash
# https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions
#
ARGS=1
E_BADARGS=65
if [ $# -ne "$ARGS" ]; then
        echo "Usage: `basename $0` [retrace-task-id]"
        exit $E_BADARGS
fi
TASK=$1
RETRACEDIR=$(awk '/SaveDir =/ { print $3 }' /etc/retrace-server.conf)
if [ ! -d $RETRACEDIR/$TASK ]; then
        echo "Retrace task $TASK does not exist in $RETRACEDIR"
        exit $E_BADARGS
fi
for f in status started_time finished_time progress; do rm -f $RETRACEDIR/$TASK/$f; done
rm -rf $RETRACEDIR/$TASK/misc/*
retrace-server-worker $TASK



Version-Release number of selected component (if applicable):
retrace-server-1.9-6.el6.noarch


How reproducible:
Everytime if you have a core which fails.


Steps to Reproduce:
Queue a vmcore which fails (for example, maybe a split vmcore?).



Actual results:
You have to follow these manual instructions:
https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions


Expected results:
Be able to run something like this:
$ retrace-server-requeue <taskid>


Additional info:
If possible, we may be able to improve the handling of split vmcores as well.  as I recall though, there's a problem with someone replacing the file in 'crash/vmcore' with a new one.

Comment 2 Michal Toman 2013-08-20 13:34:16 UTC

Fixed in upstream. Just added --restart option to retrace-server-worker so that the restart needs to be explicitely mentioned.

commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
Author: Michal Toman <mtoman>
Date:   Tue Aug 20 15:32:43 2013 +0200

    worker: add --restart option

Comment 3 Dave Wysochanski 2013-08-20 15:21:14 UTC

(In reply to Michal Toman from comment #2)
> Fixed in upstream. Just added --restart option to retrace-server-worker so
> that the restart needs to be explicitely mentioned.
> 
> commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f
> Author: Michal Toman <mtoman>
> Date:   Tue Aug 20 15:32:43 2013 +0200
> 
>     worker: add --restart option

Thanks!

Comment 4 Fedora Update System 2013-08-21 11:37:23 UTC

retrace-server-1.10-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.10-1.el6

Comment 5 Fedora Update System 2013-08-21 19:01:29 UTC

Package retrace-server-1.10-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.10-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11280/retrace-server-1.10-1.el6
then log in and leave karma (feedback).

Comment 6 Fedora Update System 2013-08-22 18:25:39 UTC

retrace-server-1.10-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.