Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 759433 - OpenMPI job fails when sshd.sh putting identity keys back.
OpenMPI job fails when sshd.sh putting identity keys back.
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor (Show other bugs)
2.1
Unspecified Unspecified
unspecified Severity unspecified
: 2.1.1
: ---
Assigned To: Timothy St. Clair
Daniel Horák
:
Depends On:
Blocks: 765607
  Show dependency treegraph
 
Reported: 2011-12-02 06:33 EST by Daniel Horák
Modified: 2012-03-02 09:17 EST (History)
4 users (show)

See Also:
Fixed In Version: condor-7.6.5-0.9
Doc Type: Bug Fix
Doc Text:
C: Run an OpenMPI/parallel universe job C: condor_chirp will fail to write file F: condor_chirp was using relative paths vs. absolute R: Parallel universe jobs run to completion
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-02-06 13:18:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Configuration and OpenMPI job (to comment 0) (2.59 KB, application/x-gzip)
2011-12-02 07:58 EST, Daniel Horák
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Condor 2630 None None None Never
Red Hat Product Errata RHSA-2012:0100 normal SHIPPED_LIVE Moderate: MRG Grid security, bug fix, and enhancement update 2012-02-06 18:15:47 EST

  None (edit)
Description Daniel Horák 2011-12-02 06:33:31 EST
Description of problem:
  OpenMPI job submited to parallel universe fails when condor_chirp putting identity keys back.

Version-Release number of selected component (if applicable):
  condor-7.6.5-0.8.el5.i386

How reproducible:
  100%

Steps to Reproduce:
1. Setup parallel universe (see configuration file in attachment).
2. Submit OpenMPI job included in attachment 
    (it is the same as in bug 537232 comment 2)
  - openmpiscript is customised from actual version of 
      /usr/share/doc/condor-7.6.5/examples/openmpiscript 
3. After job finish, check output and error files of the job.
  
Actual results:
  # cat /tmp/mpi_outfile.0 
    error 0 chirp putting identity keys back
  # cat /tmp/mpi_errfile.0
    chirp: couldn't putfile: No such file or directory
    /usr/libexec/condor/sshd.sh: line 69:  3991 Aborted                 $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_REMOTE_SPOOL_DIR/$_CONDOR_PROCNO.key

Expected results:
  No error in mentioned files, correctly launched OpenMPI job.

Additional info:
  About 0 printed as error code in output message is bug 759154.
  About selinux disallowing ssh keys generation is bug 759403.

Am I doing anything wrong?
Comment 1 Daniel Horák 2011-12-02 06:35:45 EST
After small probing it's look like condor_chirp don't like absolute path for remote file.
If I change this line in /usr/libexec/condor/sshd.sh (around line 69):
  $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_REMOTE_SPOOL_DIR/$_CONDOR_PROCNO.key
to:
  $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_PROCNO.key
key is correctly putted to central manager machine (to /var/lib/condor/0.key.
Comment 2 Daniel Horák 2011-12-02 07:58:39 EST
Created attachment 539618 [details]
Configuration and OpenMPI job (to comment 0)
Comment 3 Timothy St. Clair 2011-12-12 14:44:06 EST
Could you verify this exists in condor-7.6.5-0.9.

This could be related to https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2630.  Which should be in the afore mentioned build.
Comment 4 Daniel Horák 2011-12-13 03:51:19 EST
On RHEL 5.7 i386 with condor-7.6.5-0.9.el5.i386 it is OK (ssh keys are correctly putted to CM).
Comment 5 Timothy St. Clair 2011-12-13 10:08:19 EST
Fixed upstream.
Comment 7 Timothy St. Clair 2011-12-14 13:07:59 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Run an OpenMPI/parallel universe job
C: condor_chirp will fail to write file 
F: condor_chirp was using relative paths vs. absolute
R: Parallel universe jobs run to completion
Comment 9 Daniel Horák 2012-01-10 08:45:34 EST
Verified on all platforms: RHEL 5.7 and RHEL 6.2 - i386 and x86_64:
  - identity keys are correctly putted back,
  - in output and error file is no error (relevant to this BZ).
Comment 10 errata-xmlrpc 2012-02-06 13:18:29 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0100.html

Note You need to log in before you can comment on or make changes to this bug.