Bug 489880 - execute directory missing under carod, handle_get_work
execute directory missing under carod, handle_get_work
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
All Linux
medium Severity medium
: 1.1.1
: ---
Assigned To: Robert Rati
Martin Kudlej
Depends On:
  Show dependency treegraph
Reported: 2009-03-12 09:26 EDT by Matthew Farrellee
Modified: 2009-04-21 12:19 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-04-21 12:19:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2009-03-12 09:26:54 EDT

handle_get_work: Checking if slot 1 is known
Exception in thread Thread-1444:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/sbin/carod", line 575, in handle_exit
OSError: [Errno 2] No such file or directory: '/var/lib/condor/execute/dir_2219' 

When processing a few hundred jobs this error was seen once, and has not been reproduced over a thousand jobs.
Comment 1 Robert Rati 2009-03-13 14:41:09 EDT
The error was actually in handle_exit, not handle_get_work.  Added additional exception handling and re-worked how the chdir is handled.  Now there is a check for the existence of work_cwd and if it doesn't exist will handle the case gracefully and release the message for it to be consumed by another run/machine.

Ran 20k+ messages through the system and never saw this error.

Fixed in:
Comment 3 Martin Kudlej 2009-04-07 03:29:13 EDT
I have already tested this with 40k messages in BZ489874 without exception in logs. See  https://bugzilla.redhat.com/show_bug.cgi?id=489874#c4

Comment 5 errata-xmlrpc 2009-04-21 12:19:09 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.