Bug 1408316
Summary: | openmpi hfi_wait_for_device causes 15s delay | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Chris Schanzle <bugzilla> | ||||||
Component: | libfabric | Assignee: | Honggang LI <honli> | ||||||
Status: | CLOSED ERRATA | QA Contact: | zguo <zguo> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 7.3 | CC: | anto.trande, bugreports2005, bugzilla, honli, jcastran, jedicker, jshortt, mplaneta, ngaywood, orion, pedemonte, rdma-dev-team, surak, yizhan, zguo | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libfabric-1.4.1-1.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Cause: The function psm2_ep_num_devunits may wait for 15 seconds before return when /dev/hfi1_0 is absent.
Consequence: This introduces unnecessary delay for setups that have the PSM2 library installed without OPA/HFI hardware.
Fix: Check the existence of the OPA/HFI hardware before check the number of available device units.
Result: When OPA/HFI hardware is absent, libfabric will skip psm2_ep_num_devunits to avoid unnecessary delay.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-08-01 16:55:15 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1353018 | ||||||||
Attachments: |
|
Created attachment 1234882 [details]
Fortran90 mpi source for hello-world
Forgot to mention this is a bug submitted by a community user using a CentOS 7.3 system. Thank you! This is not an openmpi issue. It is a libfabric and libpsm2 issue. [root@rdma-dev-02 ~]$ fi_info rdma-dev-02.3246hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out verbs: IB-0x80fe version: 1.0 type: FI_EP_MSG protocol: FI_PROTO_RDMA_CM_IB_RC verbs: IB-0x80fe version: 1.0 type: FI_EP_RDM protocol: FI_PROTO_IB_RDM UDP: UDP-IP version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP sockets: IP version: 1.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP sockets: IP version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP sockets: IP version: 1.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP [root@rdma-dev-02 ~]$ rpm -q libfabric libpsm2 openmpi libfabric-1.3.0-3.el7.x86_64 libpsm2-10.2.33-1.el7.x86_64 package openmpi is not installed Hi, Chris Could you please update libfabric to libfabric-1.4.0 and try again? Please download the SRPM from following link. You need rebuild it with rpmbuild tool. https://koji.fedoraproject.org/koji/packageinfo?packageID=20963 [root@rdma-dev-02 tmp]$ rpm -qf $(which fi_info) libfabric-1.4.0-1.el7.x86_64 [root@rdma-dev-02 tmp]$ time fi_info provider: verbs fabric: IB-0x80fe domain: mlx5_0 version: 1.0 type: FI_EP_MSG protocol: FI_PROTO_RDMA_CM_IB_RC provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: sockets fabric: 10.16.40.0/24 domain: lom_1 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 10.16.40.0/24 domain: lom_1 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 10.16.40.0/24 domain: lom_1 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.0.0/24 domain: mlx5_ib0 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.0.0/24 domain: mlx5_ib0 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.0.0/24 domain: mlx5_ib0 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.1.0/24 domain: mlx5_ib1 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.1.0/24 domain: mlx5_ib1 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.1.0/24 domain: mlx5_ib1 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.2.0/24 domain: mlx5_ib0.8002 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.2.0/24 domain: mlx5_ib0.8002 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.2.0/24 domain: mlx5_ib0.8002 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.4.0/24 domain: mlx5_ib0.8004 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.4.0/24 domain: mlx5_ib0.8004 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.4.0/24 domain: mlx5_ib0.8004 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.6.0/24 domain: mlx5_ib0.8006 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.6.0/24 domain: mlx5_ib0.8006 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.6.0/24 domain: mlx5_ib0.8006 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.3.0/24 domain: mlx5_ib1.8003 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.3.0/24 domain: mlx5_ib1.8003 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.3.0/24 domain: mlx5_ib1.8003 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.5.0/24 domain: mlx5_ib1.8005 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.5.0/24 domain: mlx5_ib1.8005 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.5.0/24 domain: mlx5_ib1.8005 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.7.0/24 domain: mlx5_ib1.8007 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.7.0/24 domain: mlx5_ib1.8007 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.7.0/24 domain: mlx5_ib1.8007 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 127.0.0.0/8 domain: lo version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 127.0.0.0/8 domain: lo version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 127.0.0.0/8 domain: lo version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP real 0m0.245s user 0m0.005s sys 0m0.019s [root@rdma-dev-02 tmp]$ This bug could be a duplication of https://bugzilla.redhat.com/show_bug.cgi?id=1354417 *** Bug 1354417 has been marked as a duplicate of this bug. *** Confirmed - updating to libfabric-1.4.0-1.el7.centos.x86_64 resolves the issue. Thank you! [schanzle@spud src]$ rpm -q libfabric libfabric-1.4.0-1.el7.centos.x86_64 [schanzle@spud src]$ time mpirun -np 4 mpi-hello-c Hello world from process 0 of 4 Hello world from process 1 of 4 Hello world from process 2 of 4 Hello world from process 3 of 4 real 0m0.262s user 0m0.067s sys 0m0.081s This bug is still on RHEL7 with libfabric-1.3 libfabric-1.3.0-3.el7.x86_64, and the link does not show a more recent version for RHEL7, only for Fedora (In reply to Alexandre Strube from comment #9) > This bug is still on RHEL7 with libfabric-1.3 libfabric-1.3.0-3.el7.x86_64, > and the link does not show a more recent version for RHEL7, only for Fedora This bug will be fixed for RHEL-7.4. It looks like the upstream fix is at https://github.com/ofiwg/libfabric/commit/31384811a549cb7c3c7f8fba6f326a1850a8a5b1 Is there a workaround for this with 1.3.0? (In reply to Orion Poplawski from comment #12) > Is there a workaround for this with 1.3.0? No. Reproducer: [root@rdma-qe-06 ~]$ rpm -qf $(which fi_info) libfabric-1.3.0-3.el7.x86_64 [root@rdma-qe-06 ~]$ fi_info rdma-qe-06.56339hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out verbs: IB-0x80fe version: 1.0 type: FI_EP_MSG protocol: FI_PROTO_RDMA_CM_IB_RC verbs: IB-0x80fe version: 1.0 type: FI_EP_RDM protocol: FI_PROTO_IB_RDM UDP: UDP-IP version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP sockets: IP version: 1.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP sockets: IP version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP sockets: IP version: 1.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP Verification: [root@rdma-qe-06 ~]$ rpm -q libfabric libfabric-1.4.1-1.el7.x86_64 [root@rdma-qe-06 ~]$ time fi_info provider: verbs fabric: IB-0x80fe domain: mlx5_0 version: 1.0 type: FI_EP_MSG protocol: FI_PROTO_RDMA_CM_IB_RC provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: UDP fabric: UDP-IP domain: udp version: 1.0 type: FI_EP_DGRAM protocol: FI_PROTO_UDP provider: sockets fabric: 10.16.40.0/24 domain: lom_1 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 10.16.40.0/24 domain: lom_1 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 10.16.40.0/24 domain: lom_1 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.0.0/24 domain: mlx5_ib0 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.0.0/24 domain: mlx5_ib0 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.0.0/24 domain: mlx5_ib0 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.1.0/24 domain: mlx5_ib1 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.1.0/24 domain: mlx5_ib1 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.1.0/24 domain: mlx5_ib1 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.2.0/24 domain: mlx5_ib0.8002 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.2.0/24 domain: mlx5_ib0.8002 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.2.0/24 domain: mlx5_ib0.8002 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.6.0/24 domain: mlx5_ib0.8006 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.6.0/24 domain: mlx5_ib0.8006 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.6.0/24 domain: mlx5_ib0.8006 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.4.0/24 domain: mlx5_ib0.8004 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.4.0/24 domain: mlx5_ib0.8004 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.4.0/24 domain: mlx5_ib0.8004 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.5.0/24 domain: mlx5_ib1.8005 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.5.0/24 domain: mlx5_ib1.8005 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.5.0/24 domain: mlx5_ib1.8005 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.3.0/24 domain: mlx5_ib1.8003 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.3.0/24 domain: mlx5_ib1.8003 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.3.0/24 domain: mlx5_ib1.8003 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.7.0/24 domain: mlx5_ib1.8007 version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.7.0/24 domain: mlx5_ib1.8007 version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 172.31.7.0/24 domain: mlx5_ib1.8007 version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 127.0.0.0/8 domain: lo version: 2.0 type: FI_EP_MSG protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 127.0.0.0/8 domain: lo version: 2.0 type: FI_EP_DGRAM protocol: FI_PROTO_SOCK_TCP provider: sockets fabric: 127.0.0.0/8 domain: lo version: 2.0 type: FI_EP_RDM protocol: FI_PROTO_SOCK_TCP real 0m0.247s user 0m0.001s sys 0m0.026s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2011 (In reply to Orion Poplawski from comment #12) > Is there a workaround for this with 1.3.0? Try using "--mca pml ob1 --mca btl self,tcp" this will enforce using TCP interface from the very beginning. |
Created attachment 1234881 [details] C mpi source for hello-world Description of problem: mpi programs take 15s to start execution waiting for /dev/hfi1_0 device. Version-Release number of selected component (if applicable): openmpi-1.10.3-3.el7.x86_64 How reproducible: 100% Steps to Reproduce: see attached C and Fortran90 source 1. module load mpi/openmpi-x86_64 2. compile either fortran or c "hello world" program 2a. mpicc mpi-hello.c -o mpi-hello-c 2b. mpifort mpi-hello.f90 -o mpi-hello-f 3. time mpirun -np 4 mpi-hello-c time mpirun -np 4 mpi-hello-f Actual results: spud.cam.nist.gov.15211hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out spud.cam.nist.gov.15212hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out spud.cam.nist.gov.15214hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out spud.cam.nist.gov.15213hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out Hello world from process 1 of 4 Hello world from process 2 of 4 Hello world from process 0 of 4 Hello world from process 3 of 4 real 0m15.270s user 0m0.088s sys 0m0.069s Expected results: no errors, execution time < 1s Additional info: after sudo modprobe hfi1 $ time mpirun -np 4 mpi-hello-c spud.cam.nist.gov.15358hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out spud.cam.nist.gov.15359hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out -------------------------------------------------------------------------- [[18561,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: spud Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- spud.cam.nist.gov.15361hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out spud.cam.nist.gov.15360hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out Hello world from process 2 of 4 Hello world from process 0 of 4 Hello world from process 3 of 4 Hello world from process 1 of 4 [spud.cam.nist.gov:15356] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics [spud.cam.nist.gov:15356] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages real 0m15.269s user 0m0.091s sys 0m0.075s