Bug 489006 - Cannot distinguish between completion and other termination of AMQP submitted work
Cannot distinguish between completion and other termination of AMQP submitted...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
1.1
All Linux
urgent Severity urgent
: 1.1.1
: ---
Assigned To: Robert Rati
Jan Sarenik
:
Depends On: 459615
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-06 12:37 EST by Matthew Farrellee
Modified: 2009-04-21 12:17 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 12:17:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2009-03-06 12:37:42 EST
1) AMQP work message submitted
2) condor picks up work
3) condor restarted
4) condor picks up work (again)
5) work completes

In step 3 and 5 a message with JobState="Exited" is sent to the submitter, there's no way to tell the difference between the situations, at least in condor-low-latency-1.0-9.el5.

In condor-low-latency-1.0-10.el5, the JobStatus is also set, so the message in step (3) is JobState="Exited" and JobStatus=1 whereas in (5) it is JobState="Exited" and JobStatus=4.
Comment 1 Robert Rati 2009-03-06 15:16:48 EST
Different symptom of BZ459615
Comment 3 Jan Sarenik 2009-03-19 10:20:38 EDT
Should I just verify that condor-low-latency-1.0-10 and higher
return the JobStatus as mentioned above?
Comment 4 Matthew Farrellee 2009-03-19 13:33:27 EDT
And the JobState. You may want to test through a situation where Condor runs the job without interruption, and runs it with restart and maybe kill -9 interruption, including to the carod (service condor-low-latency) process.
Comment 5 Jan Sarenik 2009-04-01 09:54:43 EDT
Jobs submitted via AMQP do not get run. Condor's StartLog says:

Slot requirements not satisfied.
Job requirements not satisfied.

When I put the dump into job.submit file, change Cmd to Executable
and '5' to 'vanilla', add Queue at the end, the job runs flawlessly
with condor_submit (just few lines of WARNINGs for I include really
full dump including parameters that are probably unknown to
condor_submit).

This condor runs all the vanilla jobs via condor_submit with no
problems. Low-latency is configured by adding these lines to
/etc/condor/condor_config

--------------------------------------------------------------------------
LOW_LATENCY_HOOK_FETCH_WORK = $(LIBEXEC)/hooks/hook_fetch_work.py
LOW_LATENCY_HOOK_REPLY_FETCH = $(LIBEXEC)/hooks/hook_reply_fetch.py

# Starter hooks
LOW_LATENCY_JOB_HOOK_PREPARE_JOB = $(LIBEXEC)/hooks/hook_prepare_job.py
LOW_LATENCY_JOB_HOOK_UPDATE_JOB_INFO = $(LIBEXEC)/hooks/hook_update_job_status.py
LOW_LATENCY_JOB_HOOK_JOB_EXIT = $(LIBEXEC)/hooks/hook_job_exit.py

STARTD_JOB_HOOK_KEYWORD = LOW_LATENCY

FetchWorkDelay = 10 * (Activity == "Idle")
STARTER_UPDATE_INTERVAL = 30
--------------------------------------------------------------------------

condor-7.2.2-0.9.el5
condor-job-hooks-1.0-5.el5
condor-job-hooks-common-1.0-5.el5
condor-low-latency-1.0-12.el5

I was using mainly cmd_args.py test from mrg-grid.git repo's low-latency
branch. Can you enlighten me, please?
Comment 6 Jan Sarenik 2009-04-02 09:59:28 EDT
Works as expected.
Comment 8 errata-xmlrpc 2009-04-21 12:17:27 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html

Note You need to log in before you can comment on or make changes to this bug.