Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 572574

Summary: Error reported from execute node incomplete for IWD access failure
Product: Red Hat Enterprise MRG Reporter: Matthew Farrellee <matt>
Component: condorAssignee: Erik Erlandson <eerlands>
Status: CLOSED ERRATA QA Contact: Luigi Toscano <ltoscano>
Severity: medium Docs Contact:
Priority: high    
Version: 1.0CC: fnadge, ltoscano
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 7.4.3-0.8 Doc Type: Bug Fix
Doc Text:
Previously, confusing error messages were printed when accessing a job's IWD failed during execution. The messages are corrected and the issue is resolved.
Story Points: ---
Clone Of: Environment:
Previously, confusing error messages were printed when accessing a job's IWD failed during execution. The messages are corrected and the issue is resolved.
Last Closed: 2010-10-14 16:06:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthew Farrellee 2010-03-11 15:31:51 UTC
Description of problem:

A failure to access a job's IWD during execution reports a confusing error message in the job's HoldReason.


Version-Release number of selected component (if applicable):

All up to and including condor 7.4.3-0.4


How reproducible:

100%


Steps to Reproduce:
1. 2 machine setup: 1) schedd, 2) startd
2. mkdir /tmp/wontexist
3. echo -e "cmd=/bin/true\niwd=/tmp/wontexist\nrequirements=Machine=!=\"$HOSTNAME\"\nqueue" | condor_submit
4. let the job go to H[eld] in condor_q
5. condor_q -l | grep ^HoldReason and observe "Error from slot1@startd-machine: Failed to execute '/bin/true': No such file or directory
6. on startd machine look in /var/log/condor/StarterLog.slot1 and observe:
 Create_Process: Cannot access specified cwd "/tmp/wontexist": errno = 2 (No such file or directory)
 ERROR "Create_Process(/bin/true,, ...) failed: No such file or directory" at line 530 in file os_proc.cpp


Expected results:

The HoldReason to include information about access to cwd (say iwd!) failing.

Comment 1 Matthew Farrellee 2010-03-11 15:41:12 UTC
A search of condor-wiki found #1015, which is related.

http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1015

"2. The error message should be improved."

Comment 2 Erik Erlandson 2010-03-26 20:21:23 UTC
pushed candidate fix to branch V7_4-BZ572574-misleading-create-process-iwd-err-msg

Comment 3 Erik Erlandson 2010-03-26 20:26:47 UTC
For test setup I used (my local machine and reserved lab machine as execute node),
ended up needing to use following command for submission:

% echo -e "cmd=/bin/true\nremoteiwd=/tmp\nshould_transfer_files=true\nwhen_to_transfer_output=ON_EXIT\ntransfer_executable=true\nqueue" | condor_submit

Also used following edits to local config on execute machine:

CONDOR_HOST = <IP-of-my-local-machine-via-vpn>

CCB_ADDRESS = $(COLLECTOR_HOST)
PRIVATE_NETWORK_NAME = "network_name"
SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE

Comment 4 Erik Erlandson 2010-04-30 20:02:28 UTC
Pushed an alternative fix based on MyString: V7_4-BZ572574-iwd-err-msg-MyString
See also: gt#1361

Comment 5 Luigi Toscano 2010-06-25 17:49:39 UTC
The HoldReason now explicitly says that the directory specified as Iwd does not exist (see #1 and #3).

Verified on RHEL 4.8/5.5, i386/x86_64.
condor-7.4.3-0.21

Comment 6 Florian Nadge 2010-10-07 13:51:02 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, confusing error messages were printed when accessing a job's IWD failed during execution. The messages are corrected and the issue is resolved.

Comment 8 errata-xmlrpc 2010-10-14 16:06:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html