Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 572574 - Error reported from execute node incomplete for IWD access failure
Error reported from execute node incomplete for IWD access failure
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
1.0
All Linux
high Severity medium
: 1.3
: ---
Assigned To: Erik Erlandson
Luigi Toscano
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-03-11 10:31 EST by Matthew Farrellee
Modified: 2010-10-14 12:06 EDT (History)
2 users (show)

See Also:
Fixed In Version: 7.4.3-0.8
Doc Type: Bug Fix
Doc Text:
Previously, confusing error messages were printed when accessing a job's IWD failed during execution. The messages are corrected and the issue is resolved.
Story Points: ---
Clone Of:
Environment:
Previously, confusing error messages were printed when accessing a job's IWD failed during execution. The messages are corrected and the issue is resolved.
Last Closed: 2010-10-14 12:06:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 11:56:44 EDT

  None (edit)
Description Matthew Farrellee 2010-03-11 10:31:51 EST
Description of problem:

A failure to access a job's IWD during execution reports a confusing error message in the job's HoldReason.


Version-Release number of selected component (if applicable):

All up to and including condor 7.4.3-0.4


How reproducible:

100%


Steps to Reproduce:
1. 2 machine setup: 1) schedd, 2) startd
2. mkdir /tmp/wontexist
3. echo -e "cmd=/bin/true\niwd=/tmp/wontexist\nrequirements=Machine=!=\"$HOSTNAME\"\nqueue" | condor_submit
4. let the job go to H[eld] in condor_q
5. condor_q -l | grep ^HoldReason and observe "Error from slot1@startd-machine: Failed to execute '/bin/true': No such file or directory
6. on startd machine look in /var/log/condor/StarterLog.slot1 and observe:
 Create_Process: Cannot access specified cwd "/tmp/wontexist": errno = 2 (No such file or directory)
 ERROR "Create_Process(/bin/true,, ...) failed: No such file or directory" at line 530 in file os_proc.cpp


Expected results:

The HoldReason to include information about access to cwd (say iwd!) failing.
Comment 1 Matthew Farrellee 2010-03-11 10:41:12 EST
A search of condor-wiki found #1015, which is related.

http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1015

"2. The error message should be improved."
Comment 2 Erik Erlandson 2010-03-26 16:21:23 EDT
pushed candidate fix to branch V7_4-BZ572574-misleading-create-process-iwd-err-msg
Comment 3 Erik Erlandson 2010-03-26 16:26:47 EDT
For test setup I used (my local machine and reserved lab machine as execute node),
ended up needing to use following command for submission:

% echo -e "cmd=/bin/true\nremoteiwd=/tmp\nshould_transfer_files=true\nwhen_to_transfer_output=ON_EXIT\ntransfer_executable=true\nqueue" | condor_submit

Also used following edits to local config on execute machine:

CONDOR_HOST = <IP-of-my-local-machine-via-vpn>

CCB_ADDRESS = $(COLLECTOR_HOST)
PRIVATE_NETWORK_NAME = "network_name"
SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
Comment 4 Erik Erlandson 2010-04-30 16:02:28 EDT
Pushed an alternative fix based on MyString: V7_4-BZ572574-iwd-err-msg-MyString
See also: gt#1361
Comment 5 Luigi Toscano 2010-06-25 13:49:39 EDT
The HoldReason now explicitly says that the directory specified as Iwd does not exist (see #1 and #3).

Verified on RHEL 4.8/5.5, i386/x86_64.
condor-7.4.3-0.21
Comment 6 Florian Nadge 2010-10-07 09:51:02 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, confusing error messages were printed when accessing a job's IWD failed during execution. The messages are corrected and the issue is resolved.
Comment 8 errata-xmlrpc 2010-10-14 12:06:46 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.