Bug 835923

Summary: OpenMPI problem with SELinux (Grid - parallel universe)
Product: Red Hat Enterprise Linux 6 Reporter: Daniel Horák <dahorak>
Component: selinux-policyAssignee: Miroslav Grepl <mgrepl>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.3CC: cww, dwalsh, iboverma, matt, mkudlej, mmalik, mtruneck
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: selinux-policy-3.7.19-159.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 08:24:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 435010, 852456    

Description Daniel Horák 2012-06-27 14:40:56 UTC
Description of problem:
  OpenMPI job (parallel universe in MRG/Grid) fails because of Selinux problem.

Version-Release number of selected component (if applicable):
  # rpm -qa selinux-policy
    selinux-policy-3.7.19-155.el6_3.noarch
  # rpm -q condor
    condor-7.6.5-0.15.el6.x86_64
  
How reproducible:
  100%

Steps to Reproduce:
1. Install condor and OpenMPI and configure it for using parallel universe.
2. Prepare and submit OpenMPI job (see bug 537232 comment 2).
3. After job finish, check output files of the job and selinux messages:

# getenforce 
  Permissive

# START_DATE_TIME=$(date "+%m/%d/%Y %T")

<< submit job and wait for finish >>

Output file is empty and error file contain following error (First warning about plm_rsh_agent is related to bug 772587):
# cat /tmp/openmpi_errfile.9.0-0 
  --------------------------------------------------------------------------
  A deprecated MCA parameter value was specified in the environment or
  on the command line.  Deprecated MCA parameters should be avoided;
  they may disappear in future releases.

    Deprecated parameter: plm_rsh_agent
  --------------------------------------------------------------------------
  --------------------------------------------------------------------------
  A daemon (pid 29361) died unexpectedly with status 255 while attempting
  to launch so we are aborting.

  There may be more information reported by the environment (see above).

  This may be because the daemon was unable to find all the needed shared
  libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
  location of the shared libraries on the remote nodes and this will
  automatically be forwarded to the remote nodes.
  --------------------------------------------------------------------------
  --------------------------------------------------------------------------
  mpirun noticed that the job aborted, but has no info as to the process
  that caused that situation.
  --------------------------------------------------------------------------


# ausearch -m AVC -m USER_AVC -m SELINUX_ERR -ts ${START_DATE_TIME}
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.529:1575): arch=c000003e syscall=59 success=yes exit=0 a0=26e6670 a1=26e0700 a2=26dfbe0 a3=188 items=0 ppid=12621 pid=12651 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null)
  type=AVC msg=audit(1340806649.529:1575): avc:  denied  { append } for  pid=12651 comm="ssh" path="/var/lib/condor/execute/dir_12475/_condor_stdout" dev=dm-0 ino=525574 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.537:1576): arch=c000003e syscall=4 success=yes exit=0 a0=7fffe24e1195 a1=7fffe24ddf80 a2=7fffe24ddf80 a3=13 items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null)
  type=AVC msg=audit(1340806649.537:1576): avc:  denied  { getattr } for  pid=12652 comm="ssh" path="/var/lib/condor/execute/dir_12475/tmp/2.key" dev=dm-0 ino=525613 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.540:1578): arch=c000003e syscall=42 success=yes exit=0 a0=3 a1=2b646c42aff0 a2=10 a3=fffffffffffffee0 items=0 ppid=12621 pid=12651 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null)
  type=AVC msg=audit(1340806649.540:1578): avc:  denied  { name_connect } for  pid=12651 comm="ssh" dest=4444 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=system_u:object_r:kerberos_master_port_t:s0 tclass=tcp_socket
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.540:1577): arch=c000003e syscall=42 success=yes exit=0 a0=3 a1=2aeb6520eff0 a2=10 a3=fffffffffffffee0 items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null)
  type=AVC msg=audit(1340806649.540:1577): avc:  denied  { name_connect } for  pid=12652 comm="ssh" dest=4444 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=system_u:object_r:kerberos_master_port_t:s0 tclass=tcp_socket
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.541:1579): arch=c000003e syscall=2 success=yes exit=4 a0=2aeb6520e660 a1=0 a2=0 a3=2c items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null)
  type=AVC msg=audit(1340806649.541:1579): avc:  denied  { open } for  pid=12652 comm="ssh" name="2.key" dev=dm-0 ino=525613 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file
  type=AVC msg=audit(1340806649.541:1579): avc:  denied  { read } for  pid=12652 comm="ssh" name="2.key" dev=dm-0 ino=525613 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.562:1580): arch=c000003e syscall=2 success=yes exit=4 a0=2aeb6520efc0 a1=441 a2=1b6 a3=0 items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null)
  type=AVC msg=audit(1340806649.562:1580): avc:  denied  { create } for  pid=12652 comm="ssh" name=".ssh_host_rsa_key." scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file
  type=AVC msg=audit(1340806649.562:1580): avc:  denied  { add_name } for  pid=12652 comm="ssh" name=".ssh_host_rsa_key." scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=dir
  type=AVC msg=audit(1340806649.562:1580): avc:  denied  { write } for  pid=12652 comm="ssh" name="dir_12475" dev=dm-0 ino=525569 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=dir
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.616:1581): arch=c000003e syscall=16 success=no exit=-25 a0=5 a1=5401 a2=7fffe24dde00 a3=3df914aa items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null)
  type=AVC msg=audit(1340806649.616:1581): avc:  denied  { ioctl } for  pid=12652 comm="ssh" path="/var/lib/condor/execute/dir_12475/_condor_stdout" dev=dm-0 ino=525574 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file

This output is from one node, on other nodes is only one message:

# ausearch -m AVC -m USER_AVC -m SELINUX_ERR -ts ${START_DATE_TIME}
  ----
  time->Wed Jun 27 10:17:29 2012
  type=SYSCALL msg=audit(1340806649.727:1179): arch=c000003e syscall=1 success=yes exit=54 a0=5 a1=2b8c78b762a0 a2=36 a3=65727275632f7274 items=0 ppid=14382 pid=14388 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="sshd" exe="/usr/sbin/sshd" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=AVC msg=audit(1340806649.727:1179): avc:  denied  { dyntransition } for  pid=14388 comm="sshd" scontext=unconfined_u:system_r:sshd_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process

  
Actual results:
  There is many avc errors and openmpi job fail.

Expected results:
  There is no avc error and openmpi job pass.

Additional info:
  This bug is probably connected with bug 759403.

  After additional investigation I found, that the problem appear somewhere between selinux-policy-3.7.19-150.el6 and selinux-policy-3.7.19-153.el6.

Comment 22 Daniel Horák 2012-12-07 15:38:45 UTC
Tested via automatic test on RHEL 6.4 i386/x86_64 on two versions of condor
  condor-7.6.5-0.22.el6.i686
  condor-7.8.7-0.6.el6_3.i686

# cat /etc/redhat-release 
  Red Hat Enterprise Linux Server release 6.4 Beta (Santiago)

# rpm -q selinux-policy
  selinux-policy-3.7.19-185.el6.noarch

No selinux problem found, proposing to VERIFIED.

Comment 23 errata-xmlrpc 2013-02-21 08:24:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0314.html