Bug 475865 - Periodic* race in JobRouter (and elsewhere)
Periodic* race in JobRouter (and elsewhere)
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
1.0
All Linux
high Severity high
: 1.1.1
: ---
Assigned To: Matthew Farrellee
Jeff Needle
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-10 15:43 EST by Matthew Farrellee
Modified: 2009-04-21 12:18 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 12:18:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2008-12-10 15:43:05 EST
Specifically for the JobRouter, the configuration contains:

    set_PeriodicRemove = JobStatus == 5 || \
                         (JobStatus == 1 && (CurrentTime - QDate) > 3600*6); \

JobStatus 5 is the Hold state.

When a job is submitted to Condor it is briefly put in the Hold state while its data is spooled. That brief hold can be long enough, or just timed poorly enough, for the Periodic expressions to be evaluated. In this example, the PeriodicRemove evaluates to true, JobStatus == 5, and the job is removed before it is completely spooled. Oops!

A temporary workaround, which is fragile, is to test (JobStatus == 5 && HoldReason =!= "Spooling input data files").
Comment 2 Matthew Farrellee 2009-01-29 11:42:42 EST
This will be addressed in 7.2.1-0.2

commit d1763c3bc25c2dfb511a61048eb872d5f28fd2da
Author: Dan Bradley <dan>
Date:   Fri Dec 12 17:38:52 2008 -0600

    Added protection against periodic expressions messing up the 'spooling' hold state.
    This is a temporary solution for 7.2 only.  In 7.3, we will get rid of
    the spooling state.
Comment 4 Jan Sarenik 2009-03-09 08:27:07 EDT
I have reproduced the bug on RHEL5.3 i386,
condor-7.2.0-3.el5 (from MRG 1.1)

The bug is not present on condor-7.2.2-0.7.el5
(from MRG-candidate repo).
Comment 6 errata-xmlrpc 2009-04-21 12:18:14 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html

Note You need to log in before you can comment on or make changes to this bug.