Red Hat Bugzilla – Bug 489880
execute directory missing under carod, handle_get_work
Last modified: 2009-04-21 12:19:09 EDT
handle_get_work: Checking if slot 1 is known
Exception in thread Thread-1444:
Traceback (most recent call last):
File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
File "/usr/lib64/python2.4/threading.py", line 422, in run
File "/usr/sbin/carod", line 575, in handle_exit
OSError: [Errno 2] No such file or directory: '/var/lib/condor/execute/dir_2219'
When processing a few hundred jobs this error was seen once, and has not been reproduced over a thousand jobs.
The error was actually in handle_exit, not handle_get_work. Added additional exception handling and re-worked how the chdir is handled. Now there is a check for the existence of work_cwd and if it doesn't exist will handle the case gracefully and release the message for it to be consumed by another run/machine.
Ran 20k+ messages through the system and never saw this error.
I have already tested this with 40k messages in BZ489874 without exception in logs. See https://bugzilla.redhat.com/show_bug.cgi?id=489874#c4
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.