Description of problem: http://www.cs.wisc.edu/condor/manual/v7.4/4_4Job_Hooks.html The prepare hook can return non-0, which indicates the job should not be run. However, failure to invoke the prepare hook does not have the same effect. Version-Release number of selected component (if applicable): At least... 02/04 14:07:16 ** $CondorVersion: 7.4.2 Jan 21 2010 BuildID: RH-7.4.2-0.5.el5 PRE-RELEASE $ 02/04 14:07:16 ** $CondorPlatform: X86_64-LINUX_RHEL5 $ How reproducible: 100% Steps to Reproduce: 1. $ condor_config_val JUNK_HOOK_PREPARE_JOB /opt/junk/prepare_hook.sh 2. cat /opt/junk/prepare_hook.sh #!/bin/sh id > /tmp/prepare_hook.log env >> /tmp/prepare_hook.log ls -alR $PWD >> /tmp/prepare_hook.log exit 1 3. $ echo -e 'cmd=/bin/sleep\nargs=1m\n+hookkeyword="junk"\nqueue\n' | condor_submit 4. $ chmod a+rwx /opt/junk Actual results: From the Starter's log... 02/04 14:07:16 Submitting machine is "localhost.local" 02/04 14:07:16 ERROR: path specified for junk_HOOK_PREPARE_JOB (/opt/junk/prepare_hook.sh) is a world-writable directory (/opt/junk/)! Refusing to use. 02/04 14:07:16 setting the orig job name in starter 02/04 14:07:16 setting the orig job iwd in starter 02/04 14:07:16 Job 138781.0 set to execute immediately 02/04 14:07:16 Starting a VANILLA universe job with ID: 138781.0 02/04 14:07:16 IWD: /mnt/pool/gridmonkey 02/04 14:07:16 About to exec /bin/sleep 1m 02/04 14:07:16 Create_Process succeeded, pid=23814 02/04 14:08:16 Process exited, pid=23814, status=0 02/04 14:08:16 Got SIGQUIT. Performing fast shutdown. 02/04 14:08:16 ShutdownFast all jobs. 02/04 14:08:16 **** condor_starter (condor_STARTER) pid 23798 EXITING WITH STATUS 0 Expected results: From the Starter's log... 02/04 14:10:17 Submitting machine is "localhost.local" 02/04 14:10:17 setting the orig job name in starter 02/04 14:10:17 setting the orig job iwd in starter 02/04 14:10:17 ERROR in StarterHookMgr::tryHookPrepareJob: HOOK_PREPARE_JOB (/opt/junk/prepare_hook.sh) failed (exited with status 1) 02/04 14:10:17 ShutdownFast all jobs. 02/04 14:10:17 Got SIGQUIT. Performing fast shutdown. 02/04 14:10:17 ShutdownFast all jobs. 02/04 14:10:17 **** condor_starter (condor_STARTER) pid 9005 EXITING WITH STATUS 0
Suggested behavior is to kick the job back to the queue leaving it in the idle state.
Pushed branch V7_4-BZ561958-abort-on-prepare-hook-fail to grid repo. Created upstream gt#1398, and submitted patch for review.
When the invokation of the PREPARE hook fails, the job is put back to idle. Verified on RHEL4.8/5.5, i386/x86_64. condor-7.4.3-0.21
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, failure to invoke the prepare did not return non-0 values to indicate that the job should not run. With this update, the job is put back to idle when the invocation of the PREPARE hook fails.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -Previously, failure to invoke the prepare did not return non-0 values to indicate that the job should not run. With this update, the job is put back to idle when the invocation of the PREPARE hook fails.+Previously, failure to invoke the prepare did not return non-0 values to indicate that the job should not run. With this update, the job is put back to idle when invoking the PREPARE hook fails.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html