Bug 489880 - execute directory missing under carod, handle_get_work
execute directory missing under carod, handle_get_work
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
1.1
All Linux
medium Severity medium
: 1.1.1
: ---
Assigned To: Robert Rati
Martin Kudlej
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-12 09:26 EDT by Matthew Farrellee
Modified: 2009-04-21 12:19 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 12:19:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2009-03-12 09:26:54 EDT
condor-low-latency-1.0-11.el5
condor-7.2.2-0.7.el5

handle_get_work: Checking if slot 1 is known
Exception in thread Thread-1444:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/sbin/carod", line 575, in handle_exit
    os.chdir(work_cwd)
OSError: [Errno 2] No such file or directory: '/var/lib/condor/execute/dir_2219' 

When processing a few hundred jobs this error was seen once, and has not been reproduced over a thousand jobs.
Comment 1 Robert Rati 2009-03-13 14:41:09 EDT
The error was actually in handle_exit, not handle_get_work.  Added additional exception handling and re-worked how the chdir is handled.  Now there is a check for the existence of work_cwd and if it doesn't exist will handle the case gracefully and release the message for it to be consumed by another run/machine.

Ran 20k+ messages through the system and never saw this error.

Fixed in:
condor-low-latency-1.0-12
Comment 3 Martin Kudlej 2009-04-07 03:29:13 EDT
I have already tested this with 40k messages in BZ489874 without exception in logs. See  https://bugzilla.redhat.com/show_bug.cgi?id=489874#c4

-->VERIFIED
Comment 5 errata-xmlrpc 2009-04-21 12:19:09 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html

Note You need to log in before you can comment on or make changes to this bug.