Bug 750078 - RFE: Condor CronTab scheduling and Grid Universe integration
Summary: RFE: Condor CronTab scheduling and Grid Universe integration
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.0
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: grid-maint-list
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On: 782552 876873
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-30 18:03 UTC by Matthew Farrellee
Modified: 2016-05-26 19:13 UTC (History)
4 users (show)

Fixed In Version: condor-7.8.2-0.1
Doc Type: Enhancement
Doc Text:
C: The ability to run crondor ec2 jobs. C: The grid universe did not support crondor behavior. C: Added time deferral code to job initialization. R: Grid universe jobs now supports cron submission parameters.
Clone Of:
Environment:
Last Closed: 2016-05-26 19:13:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Condor 2833 0 None None None Never

Description Matthew Farrellee 2011-10-30 18:03:08 UTC
http://www.cs.wisc.edu/condor/manual/v7.6/2_12Time_Scheduling.html

Condor supports the cron-style scheduling of jobs, allowing a submitter to specify specific times when Condor should attempt to start their job. This functionality must be available for jobs submitted to the Grid Universe, EC2&Deltacloud are of specific interest.

Comment 2 Timothy St. Clair 2012-01-25 18:16:48 UTC
Initial test:

Procedure:
1.) Submit ec2 job with appended args 

submission appended args: 
####################################
# Testing of cron feature.
# The submission config below should 
# try to resubmit as fast as possible if the job goes 
# down for whatever reason. *note: great for services.
on_exit_remove = false
cron_minute = *
cron_hour = *
cron_day_of_month = *
cron_month = *
cron_day_of_week = *

2.) Wait for job to spin up, then shutdown job via amazon interface to force cron rescheduling 
3.) Observe behavior. 

Results:
It appears from the schedd is doing all the right motions and it attempts to spin up another after shutdown, but it fails due to key collisions, which then puts the job on HOLD: 

HoldReason = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Response><Errors><Error><Code>InvalidKeyPair.Duplicate</Code><Message>The keypair 'SSH_192.168.1.104_tstclair.redhat#16.0#1327512310' already exists.</Message></Error></Errors><RequestID>4a5482c5-7e68-47dd-8440-19e5b1a4d934</RequestID></Response>"

Comment 3 Timothy St. Clair 2012-01-25 18:18:55 UTC
Patched test:

In testing with upstream patch for BZ782552, it reschedules as expected.  

Further testing is still required, but it appears shifting the keys may have solved a category of issues.

Comment 4 Timothy St. Clair 2012-01-25 21:42:04 UTC
More notes in testing:  

The fix outlined in comment #3 actually only fixes for the (on_exit_remove = false) case and *does not* affect cron behavior in the grid universe.

It appears that even though the schedd goes through the motions of setting attributes, the logic which controls the motions is inside of the starter itself, which uses the deferraltimes.  

Essentially this means we will need to back propagate the logic into the grid universe.

Comment 5 Timothy St. Clair 2012-01-26 16:26:21 UTC
Dev Notes: 

This work will require the following:

1.) pulling the deferral logic out of the starter and into utils.
2.) starter cleanup after deferral-job 
3.) Adding Timer registration+callback for job creation in the gridmanager 

Unfortunately the design of a "job" is not consistent within condor, and is defined differently based on it's loc.

Comment 12 Timothy St. Clair 2012-03-19 17:31:40 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: The ability to run crondor ec2 jobs.
C: The grid universe did not support crondor behavior.
C: Added time deferral code to job initialization.
R: Grid universe jobs now supports cron submission parameters.

Comment 17 Anne-Louise Tangring 2016-05-26 19:13:04 UTC
MRG-G is in maintenance only and only customer escalations will be addressed from this point forward. This issue can be re-opened if a customer escalation associated with this issue occurs.


Note You need to log in before you can comment on or make changes to this bug.