| Summary: | RFE: Condor CronTab scheduling and Grid Universe integration | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
| Component: | condor | Assignee: | grid-maint-list <grid-maint-list> |
| Status: | CLOSED WONTFIX | QA Contact: | MRG Quality Engineering <mrgqe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 2.0 | CC: | ltoscano, matt, mkudlej, tstclair |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | condor-7.8.2-0.1 | Doc Type: | Enhancement |
| Doc Text: |
C: The ability to run crondor ec2 jobs.
C: The grid universe did not support crondor behavior.
C: Added time deferral code to job initialization.
R: Grid universe jobs now supports cron submission parameters.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-26 19:13:04 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 782552, 876873 | ||
| Bug Blocks: | |||
|
Description
Matthew Farrellee
2011-10-30 18:03:08 UTC
Initial test: Procedure: 1.) Submit ec2 job with appended args submission appended args: #################################### # Testing of cron feature. # The submission config below should # try to resubmit as fast as possible if the job goes # down for whatever reason. *note: great for services. on_exit_remove = false cron_minute = * cron_hour = * cron_day_of_month = * cron_month = * cron_day_of_week = * 2.) Wait for job to spin up, then shutdown job via amazon interface to force cron rescheduling 3.) Observe behavior. Results: It appears from the schedd is doing all the right motions and it attempts to spin up another after shutdown, but it fails due to key collisions, which then puts the job on HOLD: HoldReason = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Response><Errors><Error><Code>InvalidKeyPair.Duplicate</Code><Message>The keypair 'SSH_192.168.1.104_tstclair.redhat#16.0#1327512310' already exists.</Message></Error></Errors><RequestID>4a5482c5-7e68-47dd-8440-19e5b1a4d934</RequestID></Response>" Patched test: In testing with upstream patch for BZ782552, it reschedules as expected. Further testing is still required, but it appears shifting the keys may have solved a category of issues. More notes in testing: The fix outlined in comment #3 actually only fixes for the (on_exit_remove = false) case and *does not* affect cron behavior in the grid universe. It appears that even though the schedd goes through the motions of setting attributes, the logic which controls the motions is inside of the starter itself, which uses the deferraltimes. Essentially this means we will need to back propagate the logic into the grid universe. Dev Notes: This work will require the following: 1.) pulling the deferral logic out of the starter and into utils. 2.) starter cleanup after deferral-job 3.) Adding Timer registration+callback for job creation in the gridmanager Unfortunately the design of a "job" is not consistent within condor, and is defined differently based on it's loc.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
C: The ability to run crondor ec2 jobs.
C: The grid universe did not support crondor behavior.
C: Added time deferral code to job initialization.
R: Grid universe jobs now supports cron submission parameters.
MRG-G is in maintenance only and only customer escalations will be addressed from this point forward. This issue can be re-opened if a customer escalation associated with this issue occurs. |