Created attachment 787821 [details] Sample shell script which implements the current requeue task steps Description of problem: If a task fails, today we have the following workaround instructions: https://docspace.corp.redhat.com/docs/DOC-128956#Requeueing_Failed_Tasks "If a Task fails before it complete: The failed task would have created the required directory structure. 1. Download the vmcore and extract it to the crash/ sub-directory of the task. 2. Delete the status, started_time, finished_time, and progress files in the task directory. Also delete the files in the misc sub-directory. 3. retrace-server-worker <task_id> 4. Make sure the owner/group of the files and directories are set to values that allow others to access the Task." It would be good to put this into a 'requeue' command so people don't start messing with files in the directory or get the requeue steps wrong. I created a shell script which seems to do the above: $ cat ~/bin/retrace-server-task-requeue #!/bin/sh # Automate some steps to requeue a retrace vmcore once it's placed into <retrace>/tasks/<taskid>/crash # https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions # ARGS=1 E_BADARGS=65 if [ $# -ne "$ARGS" ]; then echo "Usage: `basename $0` [retrace-task-id]" exit $E_BADARGS fi TASK=$1 RETRACEDIR=$(awk '/SaveDir =/ { print $3 }' /etc/retrace-server.conf) if [ ! -d $RETRACEDIR/$TASK ]; then echo "Retrace task $TASK does not exist in $RETRACEDIR" exit $E_BADARGS fi for f in status started_time finished_time progress; do rm -f $RETRACEDIR/$TASK/$f; done rm -rf $RETRACEDIR/$TASK/misc/* retrace-server-worker $TASK Version-Release number of selected component (if applicable): retrace-server-1.9-6.el6.noarch How reproducible: Everytime if you have a core which fails. Steps to Reproduce: Queue a vmcore which fails (for example, maybe a split vmcore?). Actual results: You have to follow these manual instructions: https://docspace.corp.redhat.com/docs/DOC-128956#Workaround_Instructions Expected results: Be able to run something like this: $ retrace-server-requeue <taskid> Additional info: If possible, we may be able to improve the handling of split vmcores as well. as I recall though, there's a problem with someone replacing the file in 'crash/vmcore' with a new one.
Fixed in upstream. Just added --restart option to retrace-server-worker so that the restart needs to be explicitely mentioned. commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f Author: Michal Toman <mtoman> Date: Tue Aug 20 15:32:43 2013 +0200 worker: add --restart option
(In reply to Michal Toman from comment #2) > Fixed in upstream. Just added --restart option to retrace-server-worker so > that the restart needs to be explicitely mentioned. > > commit 848988edf2f5f47ebec4ffb88ab887d4d19fbd5f > Author: Michal Toman <mtoman> > Date: Tue Aug 20 15:32:43 2013 +0200 > > worker: add --restart option Thanks!
retrace-server-1.10-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/retrace-server-1.10-1.el6
Package retrace-server-1.10-1.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing retrace-server-1.10-1.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-11280/retrace-server-1.10-1.el6 then log in and leave karma (feedback).
retrace-server-1.10-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.