Bug 670132

Summary: Unable to start job on remote machine if should_transfer_files = Yes
Product: Red Hat Enterprise MRG Reporter: Lubos Trilety <ltrilety>
Component: condorAssignee: Matthew Farrellee <matt>
Status: CLOSED NOTABUG QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: betaCC: matt
Target Milestone: 1.3.2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-18 08:49:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lubos Trilety 2011-01-17 10:08:02 UTC
Description of problem:
If "should_transfer_files = Yes" is set in job submission file then the job fails to run on remote machine with:
Error from slot3@remote_host: Failed to execute '/var/lib/condor/execute/dir_3901/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error')


Version-Release number of selected component (if applicable):
condor-7.4.5-0.6


How reproducible:
100%


Steps to Reproduce:
1. configure pool

2. submit a job for remote machine set "should_transfer_files = Yes"
# echo -e "cmd=/bin/sleep\nargs=1d\nshould_transfer_files = YES\nwhen_to_transfer_output = ON_EXIT\nRequirements=(TARGET.Arch =!= UNDEFINED) && (TARGET.FileSystemDomain =!= UNDEFINED)\nqueue 11" | runuser condor -s /bin/bash -c condor_submit
Submitting job(s)...........
11 job(s) submitted to cluster 10.

3. observe status of jobs with "condor_q"
# condor_q
-- Submitter: hostname : <IP:51346> : host
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  10.0   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.1   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.2   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.3   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.4   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.5   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.6   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.7   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.8   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.9   condor          1/17 04:49   0+00:00:03 H  0   0.0  sleep 1d          
  10.10  condor          1/17 04:49   0+00:00:01 H  0   0.0  sleep 1d          
  11.0   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.1   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.2   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.3   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.4   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.5   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.6   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.7   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.8   condor          1/17 04:50   0+00:00:00 R  0   0.0  sleep 1d          
  11.9   condor          1/17 04:50   0+00:00:01 R  0   0.0  sleep 1d          
  11.10  condor          1/17 04:50   0+00:00:00 I  0   0.0  sleep 1d          
22 jobs; 1 idle, 10 running, 11 held

# condor_q -better
-- Submitter: host : <IP:51346> : host
---
010.000:  Request is held.

Hold reason: Error from slot1@remote_host: Failed to execute '/var/lib/condor/execute/dir_3842/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error')

---
010.001:  Request is held.

Hold reason: Error from slot2@remote_host: Failed to execute '/var/lib/condor/execute/dir_3845/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error')

...


Actual results:
All jobs goes to held.


Expected results:
Jobs are running.

Comment 1 Matthew Farrellee 2011-01-17 13:55:30 UTC
Is the architecture the same on both the hosts?

It is possible you are transferring a 64-bit /bin/sleep to a 32-bit machine and trying to execute it.

Comment 2 Lubos Trilety 2011-01-18 08:49:30 UTC
(In reply to comment #1)
> Is the architecture the same on both the hosts?
> 
> It is possible you are transferring a 64-bit /bin/sleep to a 32-bit machine and
> trying to execute it.

My bad, you're right. Sorry for that.

>>> CLOSED