Bug 475865

Summary: Periodic* race in JobRouter (and elsewhere)
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: gridAssignee: Matthew Farrellee <matt>
Status: CLOSED ERRATA QA Contact: Jeff Needle <jneedle>
Severity: high Docs Contact:
Priority: high    
Version: 1.0CC: dan, jsarenik, rrati
Target Milestone: 1.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-21 16:18:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2008-12-10 20:43:05 UTC
Specifically for the JobRouter, the configuration contains:

    set_PeriodicRemove = JobStatus == 5 || \
                         (JobStatus == 1 && (CurrentTime - QDate) > 3600*6); \

JobStatus 5 is the Hold state.

When a job is submitted to Condor it is briefly put in the Hold state while its data is spooled. That brief hold can be long enough, or just timed poorly enough, for the Periodic expressions to be evaluated. In this example, the PeriodicRemove evaluates to true, JobStatus == 5, and the job is removed before it is completely spooled. Oops!

A temporary workaround, which is fragile, is to test (JobStatus == 5 && HoldReason =!= "Spooling input data files").

Comment 2 Matthew Farrellee 2009-01-29 16:42:42 UTC
This will be addressed in 7.2.1-0.2

commit d1763c3bc25c2dfb511a61048eb872d5f28fd2da
Author: Dan Bradley <dan>
Date:   Fri Dec 12 17:38:52 2008 -0600

    Added protection against periodic expressions messing up the 'spooling' hold state.
    This is a temporary solution for 7.2 only.  In 7.3, we will get rid of
    the spooling state.

Comment 4 Jan Sarenik 2009-03-09 12:27:07 UTC
I have reproduced the bug on RHEL5.3 i386,
condor-7.2.0-3.el5 (from MRG 1.1)

The bug is not present on condor-7.2.2-0.7.el5
(from MRG-candidate repo).

Comment 6 errata-xmlrpc 2009-04-21 16:18:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html