Bug 670132 - Unable to start job on remote machine if should_transfer_files = Yes
Summary: Unable to start job on remote machine if should_transfer_files = Yes
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: beta
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: 1.3.2
: ---
Assignee: Matthew Farrellee
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-17 10:08 UTC by Lubos Trilety
Modified: 2011-01-18 08:49 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-18 08:49:30 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Lubos Trilety 2011-01-17 10:08:02 UTC
Description of problem:
If "should_transfer_files = Yes" is set in job submission file then the job fails to run on remote machine with:
Error from slot3@remote_host: Failed to execute '/var/lib/condor/execute/dir_3901/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error')


Version-Release number of selected component (if applicable):
condor-7.4.5-0.6


How reproducible:
100%


Steps to Reproduce:
1. configure pool

2. submit a job for remote machine set "should_transfer_files = Yes"
# echo -e "cmd=/bin/sleep\nargs=1d\nshould_transfer_files = YES\nwhen_to_transfer_output = ON_EXIT\nRequirements=(TARGET.Arch =!= UNDEFINED) && (TARGET.FileSystemDomain =!= UNDEFINED)\nqueue 11" | runuser condor -s /bin/bash -c condor_submit
Submitting job(s)...........
11 job(s) submitted to cluster 10.

3. observe status of jobs with "condor_q"
# condor_q
-- Submitter: hostname : <IP:51346> : host
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  10.0   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.1   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.2   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.3   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.4   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.5   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.6   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.7   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.8   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.9   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.10  condor          1/17 04:49   0+00:00:01 H  0   0.0  sleep 1d          
  11.0   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.1   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.2   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.3   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.4   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.5   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.6   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.7   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.8   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.9   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.10  condor          1/17 04:50   0+00:00:00 I  0   0.0  sleep 1d          
22 jobs; 1 idle, 10 running, 11 held

# condor_q -better
-- Submitter: host : <IP:51346> : host
---
010.000:  Request is held.

Hold reason: Error from slot1@remote_host: Failed to execute '/var/lib/condor/execute/dir_3842/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error')

---
010.001:  Request is held.

Hold reason: Error from slot2@remote_host: Failed to execute '/var/lib/condor/execute/dir_3845/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error')

...


Actual results:
All jobs goes to held.


Expected results:
Jobs are running.

Comment 1 Matthew Farrellee 2011-01-17 13:55:30 UTC
Is the architecture the same on both the hosts?

It is possible you are transferring a 64-bit /bin/sleep to a 32-bit machine and trying to execute it.

Comment 2 Lubos Trilety 2011-01-18 08:49:30 UTC
(In reply to comment #1)
> Is the architecture the same on both the hosts?
> 
> It is possible you are transferring a 64-bit /bin/sleep to a 32-bit machine and
> trying to execute it.

My bad, you're right. Sorry for that.

>>> CLOSED


Note You need to log in before you can comment on or make changes to this bug.