Bug 474669 - Low-latency configuration breaks normal job execution
Low-latency configuration breaks normal job execution
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
1.0
All Linux
high Severity high
: 1.1
: ---
Assigned To: Robert Rati
Jeff Needle
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-04 14:58 EST by Matthew Farrellee
Modified: 2009-02-04 11:04 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-04 11:04:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2008-12-04 14:58:04 EST
Description of problem:

The low latency configuration looks like...

# startd hooks
LOW_LATENCY_HOOK_FETCH_WORK = ...
LOW_LATENCY_HOOK_REPLY_FETCH = ...

# starter hooks
LOW_LATENCY_HOOK_PREPARE_JOB = ...
LOW_LATENCY_HOOK_UPDATE_JOB_INFO = ...
LOW_LATENCY_HOOK_JOB_EXIT = ...

STARTD_JOB_HOOK_KEYWORD = LOW_LATENCY
STARTER_JOB_HOOK_KEYWORD = LOW_LATENCY

This means every job that gets to the starter will be passed through the "starter hooks", which is bad. It actually results in jobs going on hold with reasons like: "Error from starter on ...: HOOK_PREPARE_JOB (...) failed (exited with status 1)" - which is a separate bug.


Steps to Reproduce:
1. configure low-lat (see manual + above), with /bin/false for PREPARE_JOB hook
2. condor_submit a job
  
Actual results:

Job goes on hold.


Expected results:

Success!


Additional info:

The solution is to not define STARTER_JOB_HOOK_KEYWORD, and to change the fetch hook to define the hook keyword to be LOW_LATENCY. This way only jobs from the low-lat fetch hook will be processed by the low latency hooks in the starter.
Comment 1 Robert Rati 2008-12-04 16:27:49 EST
carod will set the HOOK_KEYWORD to LOW_LATENCY_JOB so only jobs picked up with LOW_LATENCY will have the rest of the hooks run.  Fixed in

condor-low-latency-1.0-4
Comment 4 errata-xmlrpc 2009-02-04 11:04:52 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html

Note You need to log in before you can comment on or make changes to this bug.