Bug 1408316

Summary: openmpi hfi_wait_for_device causes 15s delay
Product: Red Hat Enterprise Linux 7 Reporter: Chris Schanzle <bugzilla>
Component: libfabricAssignee: Honggang LI <honli>
Status: CLOSED ERRATA QA Contact: zguo <zguo>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: anto.trande, bugreports2005, bugzilla, honli, jcastran, jedicker, jshortt, mplaneta, ngaywood, orion, pedemonte, rdma-dev-team, surak, yizhan, zguo
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libfabric-1.4.1-1.el7 Doc Type: Bug Fix
Doc Text:
Cause: The function psm2_ep_num_devunits may wait for 15 seconds before return when /dev/hfi1_0 is absent. Consequence: This introduces unnecessary delay for setups that have the PSM2 library installed without OPA/HFI hardware. Fix: Check the existence of the OPA/HFI hardware before check the number of available device units. Result: When OPA/HFI hardware is absent, libfabric will skip psm2_ep_num_devunits to avoid unnecessary delay.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 16:55:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1353018    
Attachments:
Description Flags
C mpi source for hello-world
none
Fortran90 mpi source for hello-world none

Description Chris Schanzle 2016-12-22 21:39:25 UTC
Created attachment 1234881 [details]
C mpi source for hello-world

Description of problem:
mpi programs take 15s to start execution waiting for /dev/hfi1_0 device.

Version-Release number of selected component (if applicable):
openmpi-1.10.3-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
see attached C and Fortran90 source

1. module load mpi/openmpi-x86_64
2. compile either fortran or c "hello world" program
2a.  mpicc mpi-hello.c -o mpi-hello-c
2b.  mpifort mpi-hello.f90 -o mpi-hello-f
3.  time mpirun -np 4 mpi-hello-c
    time mpirun -np 4 mpi-hello-f

Actual results:
spud.cam.nist.gov.15211hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15212hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15214hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15213hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 0 of 4
Hello world from process 3 of 4

real	0m15.270s
user	0m0.088s
sys	0m0.069s


Expected results:
no errors, execution time < 1s

Additional info:
after
  sudo modprobe hfi1

$ time mpirun -np 4 mpi-hello-c
spud.cam.nist.gov.15358hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15359hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
--------------------------------------------------------------------------
[[18561,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: spud

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
spud.cam.nist.gov.15361hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15360hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
Hello world from process 2 of 4
Hello world from process 0 of 4
Hello world from process 3 of 4
Hello world from process 1 of 4
[spud.cam.nist.gov:15356] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[spud.cam.nist.gov:15356] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

real	0m15.269s
user	0m0.091s
sys	0m0.075s

Comment 1 Chris Schanzle 2016-12-22 21:41:24 UTC
Created attachment 1234882 [details]
Fortran90 mpi source for hello-world

Comment 2 Chris Schanzle 2016-12-22 21:42:53 UTC
Forgot to mention this is a bug submitted by a community user using a CentOS 7.3 system.  Thank you!

Comment 4 Honggang LI 2016-12-23 03:39:15 UTC
This is not an openmpi issue. It is a libfabric and libpsm2 issue.

[root@rdma-dev-02 ~]$ fi_info
rdma-dev-02.3246hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_IB_RDM
UDP: UDP-IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
sockets: IP
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
[root@rdma-dev-02 ~]$ rpm -q libfabric libpsm2 openmpi
libfabric-1.3.0-3.el7.x86_64
libpsm2-10.2.33-1.el7.x86_64
package openmpi is not installed

Comment 5 Honggang LI 2016-12-23 03:49:42 UTC
Hi, Chris

 Could you please update libfabric to libfabric-1.4.0 and try again? Please download the SRPM from following link. You need rebuild it with rpmbuild tool.

https://koji.fedoraproject.org/koji/packageinfo?packageID=20963

[root@rdma-dev-02 tmp]$ rpm -qf $(which fi_info)
libfabric-1.4.0-1.el7.x86_64
[root@rdma-dev-02 tmp]$ time fi_info
provider: verbs
    fabric: IB-0x80fe
    domain: mlx5_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP

real	0m0.245s
user	0m0.005s
sys	0m0.019s
[root@rdma-dev-02 tmp]$

Comment 6 zguo 2016-12-23 04:01:28 UTC
This bug could be a duplication of https://bugzilla.redhat.com/show_bug.cgi?id=1354417

Comment 7 Honggang LI 2016-12-23 05:37:58 UTC
*** Bug 1354417 has been marked as a duplicate of this bug. ***

Comment 8 Chris Schanzle 2016-12-23 14:01:31 UTC
Confirmed - updating to libfabric-1.4.0-1.el7.centos.x86_64 resolves the issue.  Thank you!


[schanzle@spud src]$ rpm -q libfabric
libfabric-1.4.0-1.el7.centos.x86_64

[schanzle@spud src]$ time mpirun -np 4 mpi-hello-c 
Hello world from process 0 of 4
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 3 of 4

real	0m0.262s
user	0m0.067s
sys	0m0.081s

Comment 9 Alexandre Strube 2017-01-24 18:19:25 UTC
This bug is still on RHEL7 with libfabric-1.3 libfabric-1.3.0-3.el7.x86_64, and the link does not show a more recent version for RHEL7, only for Fedora

Comment 10 Honggang LI 2017-01-25 03:27:32 UTC
(In reply to Alexandre Strube from comment #9)
> This bug is still on RHEL7 with libfabric-1.3 libfabric-1.3.0-3.el7.x86_64,
> and the link does not show a more recent version for RHEL7, only for Fedora

This bug will be fixed for RHEL-7.4.

Comment 11 bugreports2005 2017-01-25 08:49:57 UTC
It looks like the upstream fix is at

https://github.com/ofiwg/libfabric/commit/31384811a549cb7c3c7f8fba6f326a1850a8a5b1

Comment 12 Orion Poplawski 2017-02-27 21:33:40 UTC
Is there a workaround for this with 1.3.0?

Comment 13 Honggang LI 2017-03-02 00:40:41 UTC
(In reply to Orion Poplawski from comment #12)
> Is there a workaround for this with 1.3.0?

No.

Comment 15 zguo 2017-03-07 01:37:33 UTC
Reproducer:
[root@rdma-qe-06 ~]$ rpm -qf $(which fi_info)
libfabric-1.3.0-3.el7.x86_64
[root@rdma-qe-06 ~]$ fi_info
rdma-qe-06.56339hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_IB_RDM
UDP: UDP-IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
sockets: IP
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP

Verification:
[root@rdma-qe-06 ~]$ rpm -q libfabric
libfabric-1.4.1-1.el7.x86_64
[root@rdma-qe-06 ~]$ time fi_info
provider: verbs
    fabric: IB-0x80fe
    domain: mlx5_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP

real	0m0.247s
user	0m0.001s
sys	0m0.026s

Comment 18 errata-xmlrpc 2017-08-01 16:55:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2011

Comment 19 Maksym Planeta 2017-08-17 15:11:56 UTC
(In reply to Orion Poplawski from comment #12)
> Is there a workaround for this with 1.3.0?

Try using "--mca pml ob1 --mca btl self,tcp" this will enforce using TCP interface from the very beginning.