Bug 1791483 - openmpi mpirun commad displays undefined symbol: uct_ep_create_connected (ignored)
Summary: openmpi mpirun commad displays undefined symbol: uct_ep_create_connected (ign...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: openmpi
Version: 8.2
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: 8.2
Assignee: Honggang LI
QA Contact: Afom T. Michael
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-15 23:06 UTC by Afom T. Michael
Modified: 2020-04-28 16:57 UTC (History)
1 user (show)

Fixed In Version: openmpi-4.0.2-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 16:57:33 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:1865 None None None 2020-04-28 16:57:51 UTC

Description Afom T. Michael 2020-01-15 23:06:50 UTC
Description of problem:
On RHEL-8.2 (4.18.0-167.el8.x86_64), running openmpi mpirun command given in step 3 of reproduce section displays "mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)".


Version-Release number of selected component (if applicable):
[root@rdma-qe-25 ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.2 Beta (Ootpa)
[root@rdma-qe-25 ~]$ uname -r
4.18.0-167.el8.x86_64
[root@rdma-qe-25 ~]$ rpm -qa | grep -E "rdma|uverb|openmpi"
rdma-core-26.0-7.el8.x86_64
rdma-core-devel-26.0-7.el8.x86_64
openmpi-4.0.2-1.el8.x86_64
librdmacm-utils-26.0-7.el8.x86_64
mpitests-openmpi-5.4.2-4.el8.x86_64
librdmacm-26.0-7.el8.x86_64
kernel-kernel-infiniband-mpi-openmpi-1.0-4.noarch
[root@rdma-qe-25 ~]$ ibstatus bnxt_re2
Infiniband device 'bnxt_re2' port 1 status:
	default gid:	 fe80:0000:0000:0000:020a:f7ff:fec5:a3a0
	base lid:	 0x0
	sm lid:		 0x0
	state:		 4: ACTIVE
	phys state:	 5: LinkUp
	rate:		 25 Gb/sec (1X EDR)
	link_layer:	 Ethernet

[root@rdma-qe-25 ~]$


How reproducible:
Always

Steps to Reproduce:
1. On hosts like rdma-qe-24/25, install RHEL-8.2 (DISTRO=RHEL-8.2.0-20191219.0)
2. Configure HCA with IPs
3. And run commands as:
   $ cat hfile_one_core 
    172.31.40.24
    172.31.40.25
   $ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re2:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong

Actual results:
In addition to other output, the following is observed:-
[root@rdma-qe-25 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re2:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core_bnxt -np $NP /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong
[rdma-qe-24.lab.bos.redhat.com:62841] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
[rdma-qe-25.lab.bos.redhat.com:36191] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
[...snip..]


Expected results:


Additional info:
Complementing usage of uct by changing argument to -mca btl '^openib,usnic,uct' from -mca btl '^openib,usnic' removes the above undefined symbol message.

Comment 1 Honggang LI 2020-01-19 14:49:26 UTC
(In reply to Afom T. Michael from comment #0)
> Description of problem:
> On RHEL-8.2 (4.18.0-167.el8.x86_64), running openmpi mpirun command given in
> step 3 of reproduce section displays "mca_base_component_repository_open:
> unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so:
> undefined symbol: uct_ep_create_connected (ignored)".
                    ^^^^^^^^^^^^^^^^^^^^^^^

  105 static inline ucs_status_t mca_btl_uct_ep_create_connected_compat (uct_iface_h iface, uct_device_addr_t *device_addr,
  106                                                                    uct_iface_addr_t *iface_addr, uct_ep_h *uct_ep)
  107 {
  108 #if UCT_API >= UCT_VERSION(1, 6)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  109     uct_ep_params_t ep_params = {.field_mask = UCT_EP_PARAM_FIELD_IFACE | UCT_EP_PARAM_FIELD_DEV_ADDR | UCT_EP_PARAM_FIELD_IFACE_ADDR,
  110                                  .iface = iface, .dev_addr = device_addr, .iface_addr = iface_addr};
  111     return uct_ep_create (&ep_params, uct_ep);
  112 #else
  113     return uct_ep_create_connected (iface, device_addr, iface_addr, uct_ep);
  114 #endif
  115 }

The error message says uct_ep_create_connected was needed. That means openmpi had been compiled
with UCT_API < UCT_VERSION(1, 6).


The build log confirms openmpi was built with ucx-1.5.

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=988928 openmpi-4.0.2-1.el8

http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/openmpi/4.0.2/1.el8/data/logs/x86_64/root.log
DEBUG util.py:439:   ucx-devel                  x86_64 1.5.2-1.el8                      build 106 k


Rebuilt openmpi with in-box ucx >= 1.6 will get rid of the error message.

Comment 3 Afom T. Michael 2020-01-24 18:46:52 UTC
After updating to openmpi-4.0.2-2.el8.x86_64, the '...mca_base_component_repository_open: ...' is gone. Marking verified.

[root@rdma-dev-26 ~]$ rpm -q openmpi mpitests-openmpi
openmpi-4.0.2-2.el8.x86_64
mpitests-openmpi-5.4.2-4.el8.x86_64
[root@rdma-dev-26 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Fri Jan 24 13:37:51 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-167.el8.x86_64
# Version               : #1 SMP Sun Dec 15 01:24:23 UTC 2019
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         9.49         0.00
            1         1000         9.50         0.11
      [...snip...]
      4194304           10      1893.83      2214.72


# All processes entering MPI_Finalize

[root@rdma-dev-26 ~]$


With openmpi-4.0.2-1.el8.x86_64, below is what was seen.
[root@rdma-dev-26 ~]$ rpm -q openmpi mpitests-openmpi
openmpi-4.0.2-1.el8.x86_64
mpitests-openmpi-5.4.2-4.el8.x86_64
[root@rdma-dev-26 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong
[rdma-dev-26.lab.bos.redhat.com:20922] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
[rdma-dev-25.lab.bos.redhat.com:17345] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Fri Jan 24 13:36:09 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-167.el8.x86_64
# Version               : #1 SMP Sun Dec 15 01:24:23 UTC 2019
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000        10.13         0.00
            1         1000         9.91         0.10
      [...snip...]
      4194304           10      2170.53      1932.39


# All processes entering MPI_Finalize

[root@rdma-dev-26 ~]$

Comment 5 errata-xmlrpc 2020-04-28 16:57:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1865


Note You need to log in before you can comment on or make changes to this bug.