Description of problem: There are certain conditions where a deferred job that misses its window can fail to enter the 'held' state. For example, it may fail to be matched with a slot in time, or (more rarely) the startd might fail after matching. These conditions can arise because deferred jobs are only checked for holding in the startd. A similar check could be added to the scheduler: the count() routine could include a scan for jobs with deferral, and if they have missed their window and are still idle, the scheduler can put them on hold. Adding this check would increase the consistency of the behavior for deferred jobs missing their window. Version-Release number of selected component (if applicable): condor-7.6.1-0.10 How reproducible: 100% Steps to Reproduce: 1. submit job with deferral_time set i.e. # job.submit universe = vanilla cmd = /bin/sleep args = 1m deferral_time = (CurrentTime + 60) deferral_window = 30 queue 1 $ condor_submit job.submit Submitting job(s). 1 job(s) submitted to cluster 1. 2. during waiting to deferral time stop startd daemon # condor_off -subsystem startd Sent "Kill-Daemon" command for "startd" to local master 3. wait until deferral time + deferral window, see job status $ condor_q -- Submitter: hostname : <IP:41815> : hostname ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 condor_user 6/8 17:08 0+00:00:08 I 0 2.0 sleep 1m Actual results: job status didn't change Expected results: after deferral time + deferral window job status changes to held Additional info: An upstream RFE exists. https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2219
At present fix for https://bugzilla.redhat.com/show_bug.cgi?id=712026 would invalidate this bug. As a job which is vacated (for whatever reason) will notify the shadow and try to re-run the job. Pushing this to the schedd user_policy is not a good solution in this case (or in general).
(In reply to comment #1) > At present fix for https://bugzilla.redhat.com/show_bug.cgi?id=712026 would > invalidate this bug. > It seems very probable. We will validate that on next release.