Red Hat Bugzilla – Bug 474669
Low-latency configuration breaks normal job execution
Last modified: 2009-02-04 11:04:52 EST
Description of problem:
The low latency configuration looks like...
# startd hooks
LOW_LATENCY_HOOK_FETCH_WORK = ...
LOW_LATENCY_HOOK_REPLY_FETCH = ...
# starter hooks
LOW_LATENCY_HOOK_PREPARE_JOB = ...
LOW_LATENCY_HOOK_UPDATE_JOB_INFO = ...
LOW_LATENCY_HOOK_JOB_EXIT = ...
STARTD_JOB_HOOK_KEYWORD = LOW_LATENCY
STARTER_JOB_HOOK_KEYWORD = LOW_LATENCY
This means every job that gets to the starter will be passed through the "starter hooks", which is bad. It actually results in jobs going on hold with reasons like: "Error from starter on ...: HOOK_PREPARE_JOB (...) failed (exited with status 1)" - which is a separate bug.
Steps to Reproduce:
1. configure low-lat (see manual + above), with /bin/false for PREPARE_JOB hook
2. condor_submit a job
Job goes on hold.
The solution is to not define STARTER_JOB_HOOK_KEYWORD, and to change the fetch hook to define the hook keyword to be LOW_LATENCY. This way only jobs from the low-lat fetch hook will be processed by the low latency hooks in the starter.
carod will set the HOOK_KEYWORD to LOW_LATENCY_JOB so only jobs picked up with LOW_LATENCY will have the rest of the hooks run. Fixed in
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.