Description of problem: OpenMPI job submited to parallel universe fails when condor_chirp putting identity keys back. Version-Release number of selected component (if applicable): condor-7.6.5-0.8.el5.i386 How reproducible: 100% Steps to Reproduce: 1. Setup parallel universe (see configuration file in attachment). 2. Submit OpenMPI job included in attachment (it is the same as in bug 537232 comment 2) - openmpiscript is customised from actual version of /usr/share/doc/condor-7.6.5/examples/openmpiscript 3. After job finish, check output and error files of the job. Actual results: # cat /tmp/mpi_outfile.0 error 0 chirp putting identity keys back # cat /tmp/mpi_errfile.0 chirp: couldn't putfile: No such file or directory /usr/libexec/condor/sshd.sh: line 69: 3991 Aborted $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_REMOTE_SPOOL_DIR/$_CONDOR_PROCNO.key Expected results: No error in mentioned files, correctly launched OpenMPI job. Additional info: About 0 printed as error code in output message is bug 759154. About selinux disallowing ssh keys generation is bug 759403. Am I doing anything wrong?
After small probing it's look like condor_chirp don't like absolute path for remote file. If I change this line in /usr/libexec/condor/sshd.sh (around line 69): $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_REMOTE_SPOOL_DIR/$_CONDOR_PROCNO.key to: $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_PROCNO.key key is correctly putted to central manager machine (to /var/lib/condor/0.key.
Created attachment 539618 [details] Configuration and OpenMPI job (to comment 0)
Could you verify this exists in condor-7.6.5-0.9. This could be related to https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2630. Which should be in the afore mentioned build.
On RHEL 5.7 i386 with condor-7.6.5-0.9.el5.i386 it is OK (ssh keys are correctly putted to CM).
Fixed upstream.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: C: Run an OpenMPI/parallel universe job C: condor_chirp will fail to write file F: condor_chirp was using relative paths vs. absolute R: Parallel universe jobs run to completion
Verified on all platforms: RHEL 5.7 and RHEL 6.2 - i386 and x86_64: - identity keys are correctly putted back, - in output and error file is no error (relevant to this BZ).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0100.html