Description of Problem: 1. It deletes entries in the "=" queue after one hour, even if they are not technically stale, i.e. the job is still running. We haven't changed this since it does not affect correctness; the user will just see the job dissapear from the "=" queue even if it's still runnning. 2. There is a race condition between at and atd that can delay job execution by one hour. We have changed this to 5 minutes since a proper fix would involve some serious changes. 3. There is a race condition that allows "atd" to miss starting jobs. We have fixed this by making it such that it will check the job directory every time it is woken up; We are now testing this fix. Version-Release number of selected component (if applicable): 3.1.8-23 with the patches that fixed problems reported in #67414 How Reproducible: Somewhat intermittent, submitting jobs which take a long time (> 1hr) to complete is the best way to reproduce these. An SMP system also seems to help. Steps to Reproduce: 1. 2. See above 3. Actual Results: See above Expected Results: Jobs should complete properly, and not be removed from the queue until acutally complete. Additional Information: attached patch applies cleanly to 7.3 at package and should fix the problems, but a little more testing is needed.
Created attachment 66624 [details] patch to fix problems noted above.
It is fixed.(3.1.8-31)