Bug 561958 - PREPARE hook invocation failure does not abort job execution
Summary: PREPARE hook invocation failure does not abort job execution
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 1.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: 1.3
: ---
Assignee: Erik Erlandson
QA Contact: Luigi Toscano
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-02-04 19:36 UTC by Matthew Farrellee
Modified: 2010-10-14 16:11 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, failure to invoke the prepare did not return non-0 values to indicate that the job should not run. With this update, the job is put back to idle when invoking the PREPARE hook fails.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:11:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Matthew Farrellee 2010-02-04 19:36:12 UTC
Description of problem:

http://www.cs.wisc.edu/condor/manual/v7.4/4_4Job_Hooks.html

The prepare hook can return non-0, which indicates the job should not be run.

However, failure to invoke the prepare hook does not have the same effect.


Version-Release number of selected component (if applicable):

At least...

02/04 14:07:16 ** $CondorVersion: 7.4.2 Jan 21 2010 BuildID: RH-7.4.2-0.5.el5 PRE-RELEASE $
02/04 14:07:16 ** $CondorPlatform: X86_64-LINUX_RHEL5 $


How reproducible:

100%


Steps to Reproduce:
1.

$ condor_config_val JUNK_HOOK_PREPARE_JOB
/opt/junk/prepare_hook.sh

2.

 cat /opt/junk/prepare_hook.sh
#!/bin/sh

id > /tmp/prepare_hook.log
env >> /tmp/prepare_hook.log
ls -alR $PWD >> /tmp/prepare_hook.log

exit 1

3.

$ echo -e 'cmd=/bin/sleep\nargs=1m\n+hookkeyword="junk"\nqueue\n' |
condor_submit

4.

$ chmod a+rwx /opt/junk

  
Actual results:

From the Starter's log...

02/04 14:07:16 Submitting machine is "localhost.local"
02/04 14:07:16 ERROR: path specified for junk_HOOK_PREPARE_JOB (/opt/junk/prepare_hook.sh) is a world-writable directory (/opt/junk/)! Refusing to use.
02/04 14:07:16 setting the orig job name in starter
02/04 14:07:16 setting the orig job iwd in starter
02/04 14:07:16 Job 138781.0 set to execute immediately
02/04 14:07:16 Starting a VANILLA universe job with ID: 138781.0
02/04 14:07:16 IWD: /mnt/pool/gridmonkey
02/04 14:07:16 About to exec /bin/sleep 1m
02/04 14:07:16 Create_Process succeeded, pid=23814
02/04 14:08:16 Process exited, pid=23814, status=0
02/04 14:08:16 Got SIGQUIT.  Performing fast shutdown.
02/04 14:08:16 ShutdownFast all jobs.
02/04 14:08:16 **** condor_starter (condor_STARTER) pid 23798 EXITING WITH STATUS 0


Expected results:

From the Starter's log...

02/04 14:10:17 Submitting machine is "localhost.local"
02/04 14:10:17 setting the orig job name in starter
02/04 14:10:17 setting the orig job iwd in starter
02/04 14:10:17 ERROR in StarterHookMgr::tryHookPrepareJob: HOOK_PREPARE_JOB (/opt/junk/prepare_hook.sh) failed (exited with status 1)
02/04 14:10:17 ShutdownFast all jobs.
02/04 14:10:17 Got SIGQUIT.  Performing fast shutdown.
02/04 14:10:17 ShutdownFast all jobs.
02/04 14:10:17 **** condor_starter (condor_STARTER) pid 9005 EXITING WITH STATUS 0

Comment 1 Matthew Farrellee 2010-04-12 21:22:11 UTC
Suggested behavior is to kick the job back to the queue leaving it in the idle state.

Comment 2 Erik Erlandson 2010-05-06 23:41:37 UTC
Pushed branch V7_4-BZ561958-abort-on-prepare-hook-fail to grid repo.
Created upstream gt#1398, and submitted patch for review.

Comment 3 Luigi Toscano 2010-06-25 16:58:18 UTC
When the invokation of the PREPARE hook fails, the job is put back to idle.

Verified on RHEL4.8/5.5, i386/x86_64.
condor-7.4.3-0.21

Comment 4 Florian Nadge 2010-10-07 17:12:03 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, failure to invoke the prepare did not return non-0 values to indicate that the job should not run. With this update, the job is put back to idle when the invocation of the PREPARE hook fails.

Comment 5 Florian Nadge 2010-10-07 17:13:12 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-Previously, failure to invoke the prepare did not return non-0 values to indicate that the job should not run. With this update, the job is put back to idle when the invocation of the PREPARE hook fails.+Previously, failure to invoke the prepare did not return non-0 values to indicate that the job should not run. With this update, the job is put back to idle when invoking the PREPARE hook fails.

Comment 7 errata-xmlrpc 2010-10-14 16:11:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.