Hide Forgot
Description of problem: If "should_transfer_files = Yes" is set in job submission file then the job fails to run on remote machine with: Error from slot3@remote_host: Failed to execute '/var/lib/condor/execute/dir_3901/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error') Version-Release number of selected component (if applicable): condor-7.4.5-0.6 How reproducible: 100% Steps to Reproduce: 1. configure pool 2. submit a job for remote machine set "should_transfer_files = Yes" # echo -e "cmd=/bin/sleep\nargs=1d\nshould_transfer_files = YES\nwhen_to_transfer_output = ON_EXIT\nRequirements=(TARGET.Arch =!= UNDEFINED) && (TARGET.FileSystemDomain =!= UNDEFINED)\nqueue 11" | runuser condor -s /bin/bash -c condor_submit Submitting job(s)........... 11 job(s) submitted to cluster 10. 3. observe status of jobs with "condor_q" # condor_q -- Submitter: hostname : <IP:51346> : host ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 10.0 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.1 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.2 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.3 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.4 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.5 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.6 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.7 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.8 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.9 condor 1/17 04:49 0+00:00:03 H 0 0.0 sleep 1d 10.10 condor 1/17 04:49 0+00:00:01 H 0 0.0 sleep 1d 11.0 condor 1/17 04:50 0+00:00:01 R 0 0.0 sleep 1d 11.1 condor 1/17 04:50 0+00:00:01 R 0 0.0 sleep 1d 11.2 condor 1/17 04:50 0+00:00:01 R 0 0.0 sleep 1d 11.3 condor 1/17 04:50 0+00:00:01 R 0 0.0 sleep 1d 11.4 condor 1/17 04:50 0+00:00:01 R 0 0.0 sleep 1d 11.5 condor 1/17 04:50 0+00:00:00 R 0 0.0 sleep 1d 11.6 condor 1/17 04:50 0+00:00:00 R 0 0.0 sleep 1d 11.7 condor 1/17 04:50 0+00:00:00 R 0 0.0 sleep 1d 11.8 condor 1/17 04:50 0+00:00:00 R 0 0.0 sleep 1d 11.9 condor 1/17 04:50 0+00:00:01 R 0 0.0 sleep 1d 11.10 condor 1/17 04:50 0+00:00:00 I 0 0.0 sleep 1d 22 jobs; 1 idle, 10 running, 11 held # condor_q -better -- Submitter: host : <IP:51346> : host --- 010.000: Request is held. Hold reason: Error from slot1@remote_host: Failed to execute '/var/lib/condor/execute/dir_3842/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error') --- 010.001: Request is held. Hold reason: Error from slot2@remote_host: Failed to execute '/var/lib/condor/execute/dir_3845/condor_exec.exe' with arguments 1d: (errno=8: 'Exec format error') ... Actual results: All jobs goes to held. Expected results: Jobs are running.
Is the architecture the same on both the hosts? It is possible you are transferring a 64-bit /bin/sleep to a 32-bit machine and trying to execute it.
(In reply to comment #1) > Is the architecture the same on both the hosts? > > It is possible you are transferring a 64-bit /bin/sleep to a 32-bit machine and > trying to execute it. My bad, you're right. Sorry for that. >>> CLOSED