1) AMQP work message submitted 2) condor picks up work 3) condor restarted 4) condor picks up work (again) 5) work completes In step 3 and 5 a message with JobState="Exited" is sent to the submitter, there's no way to tell the difference between the situations, at least in condor-low-latency-1.0-9.el5. In condor-low-latency-1.0-10.el5, the JobStatus is also set, so the message in step (3) is JobState="Exited" and JobStatus=1 whereas in (5) it is JobState="Exited" and JobStatus=4.
Different symptom of BZ459615
Should I just verify that condor-low-latency-1.0-10 and higher return the JobStatus as mentioned above?
And the JobState. You may want to test through a situation where Condor runs the job without interruption, and runs it with restart and maybe kill -9 interruption, including to the carod (service condor-low-latency) process.
Jobs submitted via AMQP do not get run. Condor's StartLog says: Slot requirements not satisfied. Job requirements not satisfied. When I put the dump into job.submit file, change Cmd to Executable and '5' to 'vanilla', add Queue at the end, the job runs flawlessly with condor_submit (just few lines of WARNINGs for I include really full dump including parameters that are probably unknown to condor_submit). This condor runs all the vanilla jobs via condor_submit with no problems. Low-latency is configured by adding these lines to /etc/condor/condor_config -------------------------------------------------------------------------- LOW_LATENCY_HOOK_FETCH_WORK = $(LIBEXEC)/hooks/hook_fetch_work.py LOW_LATENCY_HOOK_REPLY_FETCH = $(LIBEXEC)/hooks/hook_reply_fetch.py # Starter hooks LOW_LATENCY_JOB_HOOK_PREPARE_JOB = $(LIBEXEC)/hooks/hook_prepare_job.py LOW_LATENCY_JOB_HOOK_UPDATE_JOB_INFO = $(LIBEXEC)/hooks/hook_update_job_status.py LOW_LATENCY_JOB_HOOK_JOB_EXIT = $(LIBEXEC)/hooks/hook_job_exit.py STARTD_JOB_HOOK_KEYWORD = LOW_LATENCY FetchWorkDelay = 10 * (Activity == "Idle") STARTER_UPDATE_INTERVAL = 30 -------------------------------------------------------------------------- condor-7.2.2-0.9.el5 condor-job-hooks-1.0-5.el5 condor-job-hooks-common-1.0-5.el5 condor-low-latency-1.0-12.el5 I was using mainly cmd_args.py test from mrg-grid.git repo's low-latency branch. Can you enlighten me, please?
Works as expected.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0434.html