RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1408316 - openmpi hfi_wait_for_device causes 15s delay
Summary: openmpi hfi_wait_for_device causes 15s delay
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libfabric
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Honggang LI
QA Contact: zguo
URL:
Whiteboard:
: 1354417 (view as bug list)
Depends On:
Blocks: 1353018
TreeView+ depends on / blocked
 
Reported: 2016-12-22 21:39 UTC by Chris Schanzle
Modified: 2020-09-10 10:04 UTC (History)
15 users (show)

Fixed In Version: libfabric-1.4.1-1.el7
Doc Type: Bug Fix
Doc Text:
Cause: The function psm2_ep_num_devunits may wait for 15 seconds before return when /dev/hfi1_0 is absent. Consequence: This introduces unnecessary delay for setups that have the PSM2 library installed without OPA/HFI hardware. Fix: Check the existence of the OPA/HFI hardware before check the number of available device units. Result: When OPA/HFI hardware is absent, libfabric will skip psm2_ep_num_devunits to avoid unnecessary delay.
Clone Of:
Environment:
Last Closed: 2017-08-01 16:55:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
C mpi source for hello-world (372 bytes, text/x-csrc)
2016-12-22 21:39 UTC, Chris Schanzle
no flags Details
Fortran90 mpi source for hello-world (307 bytes, text/plain)
2016-12-22 21:41 UTC, Chris Schanzle
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2011 0 normal SHIPPED_LIVE RDMA stack bug fix and enhancement update 2017-08-01 17:59:05 UTC

Description Chris Schanzle 2016-12-22 21:39:25 UTC
Created attachment 1234881 [details]
C mpi source for hello-world

Description of problem:
mpi programs take 15s to start execution waiting for /dev/hfi1_0 device.

Version-Release number of selected component (if applicable):
openmpi-1.10.3-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
see attached C and Fortran90 source

1. module load mpi/openmpi-x86_64
2. compile either fortran or c "hello world" program
2a.  mpicc mpi-hello.c -o mpi-hello-c
2b.  mpifort mpi-hello.f90 -o mpi-hello-f
3.  time mpirun -np 4 mpi-hello-c
    time mpirun -np 4 mpi-hello-f

Actual results:
spud.cam.nist.gov.15211hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15212hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15214hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15213hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 0 of 4
Hello world from process 3 of 4

real	0m15.270s
user	0m0.088s
sys	0m0.069s


Expected results:
no errors, execution time < 1s

Additional info:
after
  sudo modprobe hfi1

$ time mpirun -np 4 mpi-hello-c
spud.cam.nist.gov.15358hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15359hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
--------------------------------------------------------------------------
[[18561,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: spud

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
spud.cam.nist.gov.15361hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
spud.cam.nist.gov.15360hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
Hello world from process 2 of 4
Hello world from process 0 of 4
Hello world from process 3 of 4
Hello world from process 1 of 4
[spud.cam.nist.gov:15356] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[spud.cam.nist.gov:15356] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

real	0m15.269s
user	0m0.091s
sys	0m0.075s

Comment 1 Chris Schanzle 2016-12-22 21:41:24 UTC
Created attachment 1234882 [details]
Fortran90 mpi source for hello-world

Comment 2 Chris Schanzle 2016-12-22 21:42:53 UTC
Forgot to mention this is a bug submitted by a community user using a CentOS 7.3 system.  Thank you!

Comment 4 Honggang LI 2016-12-23 03:39:15 UTC
This is not an openmpi issue. It is a libfabric and libpsm2 issue.

[root@rdma-dev-02 ~]$ fi_info
rdma-dev-02.3246hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_IB_RDM
UDP: UDP-IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
sockets: IP
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
[root@rdma-dev-02 ~]$ rpm -q libfabric libpsm2 openmpi
libfabric-1.3.0-3.el7.x86_64
libpsm2-10.2.33-1.el7.x86_64
package openmpi is not installed

Comment 5 Honggang LI 2016-12-23 03:49:42 UTC
Hi, Chris

 Could you please update libfabric to libfabric-1.4.0 and try again? Please download the SRPM from following link. You need rebuild it with rpmbuild tool.

https://koji.fedoraproject.org/koji/packageinfo?packageID=20963

[root@rdma-dev-02 tmp]$ rpm -qf $(which fi_info)
libfabric-1.4.0-1.el7.x86_64
[root@rdma-dev-02 tmp]$ time fi_info
provider: verbs
    fabric: IB-0x80fe
    domain: mlx5_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP

real	0m0.245s
user	0m0.005s
sys	0m0.019s
[root@rdma-dev-02 tmp]$

Comment 6 zguo 2016-12-23 04:01:28 UTC
This bug could be a duplication of https://bugzilla.redhat.com/show_bug.cgi?id=1354417

Comment 7 Honggang LI 2016-12-23 05:37:58 UTC
*** Bug 1354417 has been marked as a duplicate of this bug. ***

Comment 8 Chris Schanzle 2016-12-23 14:01:31 UTC
Confirmed - updating to libfabric-1.4.0-1.el7.centos.x86_64 resolves the issue.  Thank you!


[schanzle@spud src]$ rpm -q libfabric
libfabric-1.4.0-1.el7.centos.x86_64

[schanzle@spud src]$ time mpirun -np 4 mpi-hello-c 
Hello world from process 0 of 4
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 3 of 4

real	0m0.262s
user	0m0.067s
sys	0m0.081s

Comment 9 Alexandre Strube 2017-01-24 18:19:25 UTC
This bug is still on RHEL7 with libfabric-1.3 libfabric-1.3.0-3.el7.x86_64, and the link does not show a more recent version for RHEL7, only for Fedora

Comment 10 Honggang LI 2017-01-25 03:27:32 UTC
(In reply to Alexandre Strube from comment #9)
> This bug is still on RHEL7 with libfabric-1.3 libfabric-1.3.0-3.el7.x86_64,
> and the link does not show a more recent version for RHEL7, only for Fedora

This bug will be fixed for RHEL-7.4.

Comment 11 bugreports2005 2017-01-25 08:49:57 UTC
It looks like the upstream fix is at

https://github.com/ofiwg/libfabric/commit/31384811a549cb7c3c7f8fba6f326a1850a8a5b1

Comment 12 Orion Poplawski 2017-02-27 21:33:40 UTC
Is there a workaround for this with 1.3.0?

Comment 13 Honggang LI 2017-03-02 00:40:41 UTC
(In reply to Orion Poplawski from comment #12)
> Is there a workaround for this with 1.3.0?

No.

Comment 15 zguo 2017-03-07 01:37:33 UTC
Reproducer:
[root@rdma-qe-06 ~]$ rpm -qf $(which fi_info)
libfabric-1.3.0-3.el7.x86_64
[root@rdma-qe-06 ~]$ fi_info
rdma-qe-06.56339hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
verbs: IB-0x80fe
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_IB_RDM
UDP: UDP-IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
sockets: IP
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
sockets: IP
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP

Verification:
[root@rdma-qe-06 ~]$ rpm -q libfabric
libfabric-1.4.1-1.el7.x86_64
[root@rdma-qe-06 ~]$ time fi_info
provider: verbs
    fabric: IB-0x80fe
    domain: mlx5_0
    version: 1.0
    type: FI_EP_MSG
    protocol: FI_PROTO_RDMA_CM_IB_RC
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: UDP
    fabric: UDP-IP
    domain: udp
    version: 1.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_UDP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 10.16.40.0/24
    domain: lom_1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.0.0/24
    domain: mlx5_ib0
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.1.0/24
    domain: mlx5_ib1
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.2.0/24
    domain: mlx5_ib0.8002
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.6.0/24
    domain: mlx5_ib0.8006
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.4.0/24
    domain: mlx5_ib0.8004
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.5.0/24
    domain: mlx5_ib1.8005
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.3.0/24
    domain: mlx5_ib1.8003
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 172.31.7.0/24
    domain: mlx5_ib1.8007
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_MSG
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_SOCK_TCP
provider: sockets
    fabric: 127.0.0.0/8
    domain: lo
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_SOCK_TCP

real	0m0.247s
user	0m0.001s
sys	0m0.026s

Comment 18 errata-xmlrpc 2017-08-01 16:55:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2011

Comment 19 Maksym Planeta 2017-08-17 15:11:56 UTC
(In reply to Orion Poplawski from comment #12)
> Is there a workaround for this with 1.3.0?

Try using "--mca pml ob1 --mca btl self,tcp" this will enforce using TCP interface from the very beginning.


Note You need to log in before you can comment on or make changes to this bug.