Hide Forgot
http://www.cs.wisc.edu/condor/manual/v7.6/2_12Time_Scheduling.html Condor supports the cron-style scheduling of jobs, allowing a submitter to specify specific times when Condor should attempt to start their job. This functionality must be available for jobs submitted to the Grid Universe, EC2&Deltacloud are of specific interest.
Initial test: Procedure: 1.) Submit ec2 job with appended args submission appended args: #################################### # Testing of cron feature. # The submission config below should # try to resubmit as fast as possible if the job goes # down for whatever reason. *note: great for services. on_exit_remove = false cron_minute = * cron_hour = * cron_day_of_month = * cron_month = * cron_day_of_week = * 2.) Wait for job to spin up, then shutdown job via amazon interface to force cron rescheduling 3.) Observe behavior. Results: It appears from the schedd is doing all the right motions and it attempts to spin up another after shutdown, but it fails due to key collisions, which then puts the job on HOLD: HoldReason = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Response><Errors><Error><Code>InvalidKeyPair.Duplicate</Code><Message>The keypair 'SSH_192.168.1.104_tstclair.redhat#16.0#1327512310' already exists.</Message></Error></Errors><RequestID>4a5482c5-7e68-47dd-8440-19e5b1a4d934</RequestID></Response>"
Patched test: In testing with upstream patch for BZ782552, it reschedules as expected. Further testing is still required, but it appears shifting the keys may have solved a category of issues.
More notes in testing: The fix outlined in comment #3 actually only fixes for the (on_exit_remove = false) case and *does not* affect cron behavior in the grid universe. It appears that even though the schedd goes through the motions of setting attributes, the logic which controls the motions is inside of the starter itself, which uses the deferraltimes. Essentially this means we will need to back propagate the logic into the grid universe.
Dev Notes: This work will require the following: 1.) pulling the deferral logic out of the starter and into utils. 2.) starter cleanup after deferral-job 3.) Adding Timer registration+callback for job creation in the gridmanager Unfortunately the design of a "job" is not consistent within condor, and is defined differently based on it's loc.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: The ability to run crondor ec2 jobs. C: The grid universe did not support crondor behavior. C: Added time deferral code to job initialization. R: Grid universe jobs now supports cron submission parameters.
MRG-G is in maintenance only and only customer escalations will be addressed from this point forward. This issue can be re-opened if a customer escalation associated with this issue occurs.