Bug 835923
| Summary: | OpenMPI problem with SELinux (Grid - parallel universe) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Daniel Horák <dahorak> |
| Component: | selinux-policy | Assignee: | Miroslav Grepl <mgrepl> |
| Status: | CLOSED ERRATA | QA Contact: | Daniel Horák <dahorak> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.3 | CC: | cww, dwalsh, iboverma, matt, mkudlej, mmalik, mtruneck |
| Target Milestone: | rc | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | selinux-policy-3.7.19-159.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-02-21 08:24:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 435010, 852456 | ||
Tested via automatic test on RHEL 6.4 i386/x86_64 on two versions of condor condor-7.6.5-0.22.el6.i686 condor-7.8.7-0.6.el6_3.i686 # cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.4 Beta (Santiago) # rpm -q selinux-policy selinux-policy-3.7.19-185.el6.noarch No selinux problem found, proposing to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0314.html |
Description of problem: OpenMPI job (parallel universe in MRG/Grid) fails because of Selinux problem. Version-Release number of selected component (if applicable): # rpm -qa selinux-policy selinux-policy-3.7.19-155.el6_3.noarch # rpm -q condor condor-7.6.5-0.15.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Install condor and OpenMPI and configure it for using parallel universe. 2. Prepare and submit OpenMPI job (see bug 537232 comment 2). 3. After job finish, check output files of the job and selinux messages: # getenforce Permissive # START_DATE_TIME=$(date "+%m/%d/%Y %T") << submit job and wait for finish >> Output file is empty and error file contain following error (First warning about plm_rsh_agent is related to bug 772587): # cat /tmp/openmpi_errfile.9.0-0 -------------------------------------------------------------------------- A deprecated MCA parameter value was specified in the environment or on the command line. Deprecated MCA parameters should be avoided; they may disappear in future releases. Deprecated parameter: plm_rsh_agent -------------------------------------------------------------------------- -------------------------------------------------------------------------- A daemon (pid 29361) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- # ausearch -m AVC -m USER_AVC -m SELINUX_ERR -ts ${START_DATE_TIME} ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.529:1575): arch=c000003e syscall=59 success=yes exit=0 a0=26e6670 a1=26e0700 a2=26dfbe0 a3=188 items=0 ppid=12621 pid=12651 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null) type=AVC msg=audit(1340806649.529:1575): avc: denied { append } for pid=12651 comm="ssh" path="/var/lib/condor/execute/dir_12475/_condor_stdout" dev=dm-0 ino=525574 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.537:1576): arch=c000003e syscall=4 success=yes exit=0 a0=7fffe24e1195 a1=7fffe24ddf80 a2=7fffe24ddf80 a3=13 items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null) type=AVC msg=audit(1340806649.537:1576): avc: denied { getattr } for pid=12652 comm="ssh" path="/var/lib/condor/execute/dir_12475/tmp/2.key" dev=dm-0 ino=525613 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.540:1578): arch=c000003e syscall=42 success=yes exit=0 a0=3 a1=2b646c42aff0 a2=10 a3=fffffffffffffee0 items=0 ppid=12621 pid=12651 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null) type=AVC msg=audit(1340806649.540:1578): avc: denied { name_connect } for pid=12651 comm="ssh" dest=4444 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=system_u:object_r:kerberos_master_port_t:s0 tclass=tcp_socket ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.540:1577): arch=c000003e syscall=42 success=yes exit=0 a0=3 a1=2aeb6520eff0 a2=10 a3=fffffffffffffee0 items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null) type=AVC msg=audit(1340806649.540:1577): avc: denied { name_connect } for pid=12652 comm="ssh" dest=4444 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=system_u:object_r:kerberos_master_port_t:s0 tclass=tcp_socket ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.541:1579): arch=c000003e syscall=2 success=yes exit=4 a0=2aeb6520e660 a1=0 a2=0 a3=2c items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null) type=AVC msg=audit(1340806649.541:1579): avc: denied { open } for pid=12652 comm="ssh" name="2.key" dev=dm-0 ino=525613 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file type=AVC msg=audit(1340806649.541:1579): avc: denied { read } for pid=12652 comm="ssh" name="2.key" dev=dm-0 ino=525613 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.562:1580): arch=c000003e syscall=2 success=yes exit=4 a0=2aeb6520efc0 a1=441 a2=1b6 a3=0 items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null) type=AVC msg=audit(1340806649.562:1580): avc: denied { create } for pid=12652 comm="ssh" name=".ssh_host_rsa_key." scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file type=AVC msg=audit(1340806649.562:1580): avc: denied { add_name } for pid=12652 comm="ssh" name=".ssh_host_rsa_key." scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=dir type=AVC msg=audit(1340806649.562:1580): avc: denied { write } for pid=12652 comm="ssh" name="dir_12475" dev=dm-0 ino=525569 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=dir ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.616:1581): arch=c000003e syscall=16 success=no exit=-25 a0=5 a1=5401 a2=7fffe24dde00 a3=3df914aa items=0 ppid=12622 pid=12652 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="ssh" exe="/usr/bin/ssh" subj=unconfined_u:system_r:condor_startd_ssh_t:s0 key=(null) type=AVC msg=audit(1340806649.616:1581): avc: denied { ioctl } for pid=12652 comm="ssh" path="/var/lib/condor/execute/dir_12475/_condor_stdout" dev=dm-0 ino=525574 scontext=unconfined_u:system_r:condor_startd_ssh_t:s0 tcontext=unconfined_u:object_r:condor_var_lib_t:s0 tclass=file This output is from one node, on other nodes is only one message: # ausearch -m AVC -m USER_AVC -m SELINUX_ERR -ts ${START_DATE_TIME} ---- time->Wed Jun 27 10:17:29 2012 type=SYSCALL msg=audit(1340806649.727:1179): arch=c000003e syscall=1 success=yes exit=54 a0=5 a1=2b8c78b762a0 a2=36 a3=65727275632f7274 items=0 ppid=14382 pid=14388 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=(none) ses=80 comm="sshd" exe="/usr/sbin/sshd" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=AVC msg=audit(1340806649.727:1179): avc: denied { dyntransition } for pid=14388 comm="sshd" scontext=unconfined_u:system_r:sshd_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=process Actual results: There is many avc errors and openmpi job fail. Expected results: There is no avc error and openmpi job pass. Additional info: This bug is probably connected with bug 759403. After additional investigation I found, that the problem appear somewhere between selinux-policy-3.7.19-150.el6 and selinux-policy-3.7.19-153.el6.