Bug 759433 - OpenMPI job fails when sshd.sh putting identity keys back.
Summary: OpenMPI job fails when sshd.sh putting identity keys back.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 2.1.1
: ---
Assignee: Timothy St. Clair
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks: 765607
TreeView+ depends on / blocked
 
Reported: 2011-12-02 11:33 UTC by Daniel Horák
Modified: 2012-03-02 14:17 UTC (History)
4 users (show)

Fixed In Version: condor-7.6.5-0.9
Doc Type: Bug Fix
Doc Text:
C: Run an OpenMPI/parallel universe job C: condor_chirp will fail to write file F: condor_chirp was using relative paths vs. absolute R: Parallel universe jobs run to completion
Clone Of:
Environment:
Last Closed: 2012-02-06 18:18:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Configuration and OpenMPI job (to comment 0) (2.59 KB, application/x-gzip)
2011-12-02 12:58 UTC, Daniel Horák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Condor 2630 0 None None None Never
Red Hat Product Errata RHSA-2012:0100 0 normal SHIPPED_LIVE Moderate: MRG Grid security, bug fix, and enhancement update 2012-02-06 23:15:47 UTC

Description Daniel Horák 2011-12-02 11:33:31 UTC
Description of problem:
  OpenMPI job submited to parallel universe fails when condor_chirp putting identity keys back.

Version-Release number of selected component (if applicable):
  condor-7.6.5-0.8.el5.i386

How reproducible:
  100%

Steps to Reproduce:
1. Setup parallel universe (see configuration file in attachment).
2. Submit OpenMPI job included in attachment 
    (it is the same as in bug 537232 comment 2)
  - openmpiscript is customised from actual version of 
      /usr/share/doc/condor-7.6.5/examples/openmpiscript 
3. After job finish, check output and error files of the job.
  
Actual results:
  # cat /tmp/mpi_outfile.0 
    error 0 chirp putting identity keys back
  # cat /tmp/mpi_errfile.0
    chirp: couldn't putfile: No such file or directory
    /usr/libexec/condor/sshd.sh: line 69:  3991 Aborted                 $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_REMOTE_SPOOL_DIR/$_CONDOR_PROCNO.key

Expected results:
  No error in mentioned files, correctly launched OpenMPI job.

Additional info:
  About 0 printed as error code in output message is bug 759154.
  About selinux disallowing ssh keys generation is bug 759403.

Am I doing anything wrong?

Comment 1 Daniel Horák 2011-12-02 11:35:45 UTC
After small probing it's look like condor_chirp don't like absolute path for remote file.
If I change this line in /usr/libexec/condor/sshd.sh (around line 69):
  $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_REMOTE_SPOOL_DIR/$_CONDOR_PROCNO.key
to:
  $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_PROCNO.key
key is correctly putted to central manager machine (to /var/lib/condor/0.key.

Comment 2 Daniel Horák 2011-12-02 12:58:39 UTC
Created attachment 539618 [details]
Configuration and OpenMPI job (to comment 0)

Comment 3 Timothy St. Clair 2011-12-12 19:44:06 UTC
Could you verify this exists in condor-7.6.5-0.9.

This could be related to https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2630.  Which should be in the afore mentioned build.

Comment 4 Daniel Horák 2011-12-13 08:51:19 UTC
On RHEL 5.7 i386 with condor-7.6.5-0.9.el5.i386 it is OK (ssh keys are correctly putted to CM).

Comment 5 Timothy St. Clair 2011-12-13 15:08:19 UTC
Fixed upstream.

Comment 7 Timothy St. Clair 2011-12-14 18:07:59 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Run an OpenMPI/parallel universe job
C: condor_chirp will fail to write file 
F: condor_chirp was using relative paths vs. absolute
R: Parallel universe jobs run to completion

Comment 9 Daniel Horák 2012-01-10 13:45:34 UTC
Verified on all platforms: RHEL 5.7 and RHEL 6.2 - i386 and x86_64:
  - identity keys are correctly putted back,
  - in output and error file is no error (relevant to this BZ).

Comment 10 errata-xmlrpc 2012-02-06 18:18:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0100.html


Note You need to log in before you can comment on or make changes to this bug.