Red Hat Bugzilla – Bug 69595
Atd fails to properly start jobs
Last modified: 2012-01-10 21:36:12 EST
Description of Problem:
1. It deletes entries in the "=" queue after one hour, even if they are
not technically stale, i.e. the job is still running. We haven't changed
this since it does not affect correctness; the user will just see the job
dissapear from the "=" queue even if it's still runnning.
2. There is a race condition between at and atd that can delay job
execution by one hour. We have changed this to 5 minutes since a proper
fix would involve some serious changes.
3. There is a race condition that allows "atd" to miss starting jobs. We
have fixed this by making it such that it will check the job directory
every time it is woken up; We are now testing this fix.
Version-Release number of selected component (if applicable):
3.1.8-23 with the patches that fixed problems reported in #67414
Somewhat intermittent, submitting jobs which take a long time (> 1hr) to
complete is the best way to reproduce these. An SMP system also seems to help.
Steps to Reproduce:
2. See above
Jobs should complete properly, and not be removed from the queue until acutally
attached patch applies cleanly to 7.3 at package and should fix the problems,
but a little more testing is needed.
Created attachment 66624 [details]
patch to fix problems noted above.
It is fixed.(3.1.8-31)