Description of problem: Running the file_no_perms.py test results in the started excepting w/o calling the exit hook. Carod's lease checking thread isn't expiring a job that isn't being updated, so the result is that carod won't allow that thread to process more work. The only fix is to restart carod. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Run carod with debug logging 2. run file_no_perms.py 3. Watch CaroLog, see that the slot is continually thought to be doing work. Actual results: Expected results: Additional info:
There were 2 issues with message expiration: 1) The messages were never being expired because the check for a slot being in use was resetting the access time 2) Once a message was expired, it was unable to be removed from the work queue thus keeping a slot "busy" that was actually empty Fixed on BZ718265-no-expiration
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: A job that causes the condor_starter to exit quickly, such as a job where the starter is unable to execute the program in the job, the low-latency daemon will not expire the low-latency job C: The slot running the job that should have been expired will not be allowed to do any more work by the low-latency daemon until the daemon has been restarted. F: Fixed issues with message expiration R: Messages are expired as expected
Tested on RHEL5.6/6.1 x x86_64/i386 with condor-low-latency-1.1-3 and it doesn't work.
Job expired after some time and slot was released. (There is an issue with exit hook, see bug 726761.) Tested with: condor-low-latency-1.2-2 Tested on: RHEL6 x86_64,i386 RHEL5 x86_64,i386 >>> VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1249.html