Hide Forgot
Description of problem: All OPENMPI benchmarks fail due to PML error as the following: -------------------------------------------------------------------------- No components were able to be opened in the pml framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: rdma-qe-24 Framework: pml -------------------------------------------------------------------------- [rdma-qe-24.lab.bos.redhat.com:73548] PML ucx cannot be selected This failure takes place on all HCAs - MLX4/5 IB/ROCE, BNXT ROCE, CXGB4 IW, QEDR IW... This is a REGRESSION issue from RHEL-8.5.0-20210521.n.1 nightly image. Version-Release number of selected component (if applicable): DISTRO=RHEL-8.5.0-20210611.n.0 + [21-06-14 13:30:08] cat /etc/redhat-release Red Hat Enterprise Linux release 8.5 Beta (Ootpa) + [21-06-14 13:30:08] uname -a Linux rdma-qe-25.lab.bos.redhat.com 4.18.0-312.el8.x86_64 #1 SMP Wed Jun 2 16:30:46 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux + [21-06-14 13:30:08] cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-312.el8.x86_64 root=UUID=0a85fc97-13e8-41ff-a1d4-6669f3058baa ro crashkernel=auto resume=UUID=2262ea47-8caf-4160-8328-1dcd72c645d4 console=ttyS0,115200n81 + [21-06-14 13:30:08] rpm -q rdma-core linux-firmware rdma-core-35.0-1.el8.x86_64 linux-firmware-20201218-102.git05789708.el8.noarch Installed: mpitests-openmpi-5.7-2.el8.x86_64 openmpi-4.1.1-1.el8.x86_64 openmpi-devel-4.1.1-1.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. With the above build and packages on both RDMA server and client hosts 2. On the server side, initiate an MPITESTS-OPENMPI benchmark test, as the following example command timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re3:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca pml ucx -x UCX_NET_DEVICES=bnxt_roce.45 -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong 3. Actual results: All OPENMPI benchmarks fails with the following output: ------------------------------------------------------------------------- No components were able to be opened in the pml framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: rdma-qe-24 Framework: pml -------------------------------------------------------------------------- [rdma-qe-24.lab.bos.redhat.com:73493] PML ucx cannot be selected + [21-06-14 13:32:22] mpi_return=1 Expected results: All OPENMPI bencharks run normally with correct stats Additional info: The working RHEL8.5, RHEL-8.5.0-20210521.n.1 shows the following package info: DISTRO=RHEL-8.5.0-20210521.n.1 + [21-05-22 07:17:53] cat /etc/redhat-release Red Hat Enterprise Linux release 8.5 Beta (Ootpa) + [21-05-22 07:17:53] uname -a Linux rdma-qe-25.lab.bos.redhat.com 4.18.0-305.8.el8.x86_64 #1 SMP Mon May 17 14:15:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux + [21-05-22 07:17:53] cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-305.8.el8.x86_64 root=UUID=62f6b79c-3ee9-4c3e-be97-583fb4af6aa0 ro crashkernel=auto resume=UUID=d76bf578-059d-4268-b6ca-0279f76b41bf console=ttyS0,115200n81 + [21-05-22 07:17:53] rpm -q rdma-core linux-firmware rdma-core-35.0-1.el8.x86_64 linux-firmware-20201218-102.git05789708.el8.noarch Installed: mpitests-openmpi-5.7-1.el8.x86_64 openmpi-4.0.5-3.el8.x86_64 <<<============================
Here is the bad commit. commit c36d7459b6331c4da825cad5a64326e7c1a272aa Author: Yossi Itigin <yosefe> Date: Thu Feb 18 00:15:02 2021 +0200 ucx: check supported transports and devices for setting priority Add "pml_ucx_tls" parameter to control the transports to include or exclude (with a '^' prefix). UCX will be enabled only if one of the included transports is present, or if a non-excluded transport is present (in case of exclude-list with a '^' prefix). Add "pml_ucx_devices" parameter to control the devices which make UCX component set a high priority for itself. If none of the listed devices is present, the priority will be set to 19 - lower than ob1 priority. Signed-off-by: Yossi Itigin <yosefe> (cherry picked from commit 562c57ce97)
What is the output of "ucx_info -vdepw -u t" and "ibv_devinfo" on the machine?
[test@rdma-dev-26 ~]$ ibv_devinfo hca_id: bnxt_re0 transport: InfiniBand (0) fw_ver: 216.4.59.0 node_guid: be97:e1ff:fe70:3d80 sys_image_guid: be97:e1ff:fe70:3d80 vendor_id: 0x14e4 vendor_part_id: 5968 hw_ver: 0x2100 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet hca_id: bnxt_re1 transport: InfiniBand (0) fw_ver: 216.4.59.0 node_guid: be97:e1ff:fe70:3d81 sys_image_guid: be97:e1ff:fe70:3d81 vendor_id: 0x14e4 vendor_part_id: 5968 hw_ver: 0x2100 phys_port_cnt: 1 port: 1 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet [test@rdma-dev-26 ~]$ [test@rdma-dev-26 ~]$ [test@rdma-dev-26 ~]$ ibv_devinfo -v hca_id: bnxt_re0 transport: InfiniBand (0) fw_ver: 216.4.59.0 node_guid: be97:e1ff:fe70:3d80 sys_image_guid: be97:e1ff:fe70:3d80 vendor_id: 0x14e4 vendor_part_id: 5968 hw_ver: 0x2100 phys_port_cnt: 1 max_mr_size: 0x8000000000 page_size_cap: 0x201000 max_qp: 65537 max_qp_wr: 65407 device_cap_flags: 0x0122dd81 RESIZE_MAX_WR CURR_QP_STATE_MOD SHUTDOWN_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN N_NOTIFY_CQ MEM_WINDOW MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B Unknown flags: 0x8000 max_sge: 6 max_sge_rd: 6 max_cq: 131072 max_cqe: 16777215 max_mr: 262144 max_pd: 65536 max_qp_rd_atom: 126 max_ee_rd_atom: 0 max_res_rd_atom: 0 max_qp_init_rd_atom: 126 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_NONE (0) max_ee: 0 max_rdd: 0 max_mw: 262144 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 65536 max_mcast_grp: 0 max_mcast_qp_attach: 0 max_total_mcast_qp_attach: 0 max_ah: 131072 max_fmr: 0 max_srq: 8192 max_srq_wr: 1048574 max_srq_sge: 6 max_pkeys: 1 local_ca_ack_delay: 16 general_odp_caps: rc_odp_caps: NO SUPPORT uc_odp_caps: NO SUPPORT ud_odp_caps: NO SUPPORT xrc_odp_caps: NO SUPPORT completion_timestamp_mask not supported core clock not supported device_cap_flags_ex: 0x122DD81 tso_caps: max_tso: 0 rss_caps: max_rwq_indirection_tables: 0 max_rwq_indirection_table_size: 0 rx_hash_function: 0x0 rx_hash_fields_mask: 0x0 max_wq_type_rq: 0 packet_pacing_caps: qp_rate_limit_min: 0kbps qp_rate_limit_max: 0kbps tag matching not supported num_comp_vectors: 8 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet max_msg_sz: 0x40000000 port_cap_flags: 0x041d0000 port_cap_flags2: 0x0000 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 65535 gid_tbl_len: 256 subnet_timeout: 0 init_type_reply: 0 active_width: 4X (2) active_speed: 25.0 Gbps (32) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:be97:e1ff:fe70:3d80, RoCE v1 GID[ 1]: fe80::be97:e1ff:fe70:3d80, RoCE v2 GID[ 2]: fe80:0000:0000:0000:be97:e1ff:fe70:3d80, RoCE v1 GID[ 3]: fe80::be97:e1ff:fe70:3d80, RoCE v2 GID[ 4]: fe80:0000:0000:0000:be97:e1ff:fe70:3d80, RoCE v1 GID[ 5]: fe80::be97:e1ff:fe70:3d80, RoCE v2 GID[ 6]: 0000:0000:0000:0000:0000:ffff:ac1f:2b7e, RoCE v1 GID[ 7]: ::ffff:172.31.43.126, RoCE v2 GID[ 8]: 0000:0000:0000:0000:0000:ffff:ac1f:2d7e, RoCE v1 GID[ 9]: ::ffff:172.31.45.126, RoCE v2 GID[ 10]: 0000:0000:0000:0000:0000:ffff:ac1f:287e, RoCE v1 GID[ 11]: ::ffff:172.31.40.126, RoCE v2 hca_id: bnxt_re1 transport: InfiniBand (0) fw_ver: 216.4.59.0 node_guid: be97:e1ff:fe70:3d81 sys_image_guid: be97:e1ff:fe70:3d81 vendor_id: 0x14e4 vendor_part_id: 5968 hw_ver: 0x2100 phys_port_cnt: 1 max_mr_size: 0x8000000000 page_size_cap: 0x201000 max_qp: 65537 max_qp_wr: 65407 device_cap_flags: 0x0122dd81 RESIZE_MAX_WR CURR_QP_STATE_MOD SHUTDOWN_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN N_NOTIFY_CQ MEM_WINDOW MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B Unknown flags: 0x8000 max_sge: 6 max_sge_rd: 6 max_cq: 131072 max_cqe: 16777215 max_mr: 262144 max_pd: 65536 max_qp_rd_atom: 126 max_ee_rd_atom: 0 max_res_rd_atom: 0 max_qp_init_rd_atom: 126 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_NONE (0) max_ee: 0 max_rdd: 0 max_mw: 262144 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 65536 max_mcast_grp: 0 max_mcast_qp_attach: 0 max_total_mcast_qp_attach: 0 max_ah: 131072 max_fmr: 0 max_srq: 8192 max_srq_wr: 1048574 max_srq_sge: 6 max_pkeys: 1 local_ca_ack_delay: 16 general_odp_caps: rc_odp_caps: NO SUPPORT uc_odp_caps: NO SUPPORT ud_odp_caps: NO SUPPORT xrc_odp_caps: NO SUPPORT completion_timestamp_mask not supported core clock not supported device_cap_flags_ex: 0x122DD81 tso_caps: max_tso: 0 rss_caps: max_rwq_indirection_tables: 0 max_rwq_indirection_table_size: 0 rx_hash_function: 0x0 rx_hash_fields_mask: 0x0 max_wq_type_rq: 0 packet_pacing_caps: qp_rate_limit_min: 0kbps qp_rate_limit_max: 0kbps tag matching not supported num_comp_vectors: 8 port: 1 state: PORT_DOWN (1) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet max_msg_sz: 0x40000000 port_cap_flags: 0x041d0000 port_cap_flags2: 0x0000 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 65535 gid_tbl_len: 256 subnet_timeout: 0 init_type_reply: 0 active_width: 1X (1) active_speed: 2.5 Gbps (1) phys_state: DISABLED (3) GID[ 0]: fe80:0000:0000:0000:be97:e1ff:fe70:3d81, RoCE v1 GID[ 1]: fe80::be97:e1ff:fe70:3d81, RoCE v2 [test@rdma-dev-26 ~]$ [test@rdma-dev-26 ~]$ [test@rdma-dev-26 ~]$ [test@rdma-dev-26 ~]$ ucx_info -vdepw -u t # UCT version=1.11.0 revision # configured with: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-optimizations --disable-logging --disable-debug --disable-assertions --disable-params-check --without-java --enable-cma --without-cuda --without-gdrcopy --with-verbs --without-cm --without-knem --with-rdmacm --without-rocm --without-xpmem --without-ugni # # Memory domain: posix # Component: posix # allocate: unlimited # remote key: 24 bytes # rkey_ptr is supported # # Transport: posix # Device: memory # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 12179.00 MB/sec # latency: 80 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 100 # am_bcopy: <= 8256 # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 8 bytes # error handling: ep_check # # # Memory domain: sysv # Component: sysv # allocate: unlimited # remote key: 12 bytes # rkey_ptr is supported # # Transport: sysv # Device: memory # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 12179.00 MB/sec # latency: 80 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 100 # am_bcopy: <= 8256 # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 8 bytes # error handling: ep_check # # # Memory domain: self # Component: self # register: unlimited, cost: 0 nsec # remote key: 0 bytes # # Transport: self # Device: memory0 # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 6911.00 MB/sec # latency: 0 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 8K # am_bcopy: <= 8K # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 0 bytes # iface address: 8 bytes # error handling: ep_check # # # Memory domain: tcp # Component: tcp # register: unlimited, cost: 0 nsec # remote key: 0 bytes # # Transport: tcp # Device: bnxt_roce # System device: <unknown> # # capabilities: # bandwidth: 11818.05/ppn + 0.00 MB/sec # latency: 5206 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: bnxt_roce.45 # System device: <unknown> # # capabilities: # bandwidth: 11818.05/ppn + 0.00 MB/sec # latency: 5206 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: bnxt_roce.43 # System device: <unknown> # # capabilities: # bandwidth: 11818.05/ppn + 0.00 MB/sec # latency: 5206 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: lo # System device: <unknown> # # capabilities: # bandwidth: 11.91/ppn + 0.00 MB/sec # latency: 10960 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 18 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # Transport: tcp # Device: lom_1 # System device: <unknown> # # capabilities: # bandwidth: 113.16/ppn + 0.00 MB/sec # latency: 5776 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 0 # device num paths: 1 # max eps: 256 # device address: 6 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure, ep_check, keepalive # # # Connection manager: tcp # max_conn_priv: 2064 bytes # # Memory domain: bnxt_re0 # Component: ib # register: unlimited, cost: 180 nsec # remote key: 8 bytes # local memory handle is required for zcopy # # Transport: rc_verbs # Device: bnxt_re0:1 # System device: 0000:04:00.0 (0) # # capabilities: # bandwidth: 11664.63/ppn + 0.00 MB/sec # latency: 800 + 1.000 * N nsec # overhead: 75 nsec # put_short: <= 96 # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 4 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 4K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 4 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 4K # am_short: <= 95 # am_bcopy: <= 8255 # am_zcopy: <= 8255, up to 3 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 127 # connection: to ep # device priority: 0 # device num paths: 1 # max eps: 256 # device address: 17 bytes # ep address: 4 bytes # error handling: peer failure, ep_check # # # Transport: ud_verbs # Device: bnxt_re0:1 # System device: 0000:04:00.0 (0) # # capabilities: # bandwidth: 11664.63/ppn + 0.00 MB/sec # latency: 830 nsec # overhead: 105 nsec # am_short: <= 88 # am_bcopy: <= 4088 # am_zcopy: <= 4088, up to 5 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 3952 # connection: to ep, to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 17 bytes # iface address: 3 bytes # ep address: 6 bytes # error handling: peer failure, ep_check # # # Memory domain: bnxt_re1 # Component: ib # register: unlimited, cost: 180 nsec # remote key: 8 bytes # local memory handle is required for zcopy # < no supported devices found > # # Connection manager: rdmacm # max_conn_priv: 54 bytes # # Memory domain: cma # Component: cma # register: unlimited, cost: 9 nsec # # Transport: cma # Device: memory # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 11145.00 MB/sec # latency: 80 nsec # overhead: 400 nsec # put_zcopy: unlimited, up to 16 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 1 # get_zcopy: unlimited, up to 16 iov # get_opt_zcopy_align: <= 1 # get_align_mtu: <= 1 # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 4 bytes # error handling: peer failure, ep_check # # # UCP context # # component 0 : posix # component 1 : sysv # component 2 : self # component 3 : tcp # component 4 : ib # component 5 : rdmacm # component 6 : cma # # md 0 : component 0 posix # md 1 : component 1 sysv # md 2 : component 2 self # md 3 : component 3 tcp # md 4 : component 4 bnxt_re0 # md 5 : component 6 cma # # resource 0 : md 0 dev 0 flags -- posix/memory # resource 1 : md 1 dev 0 flags -- sysv/memory # resource 2 : md 2 dev 1 flags -- self/memory0 # resource 3 : md 3 dev 2 flags -- tcp/bnxt_roce # resource 4 : md 3 dev 3 flags -- tcp/bnxt_roce.45 # resource 5 : md 3 dev 4 flags -- tcp/bnxt_roce.43 # resource 6 : md 3 dev 5 flags -- tcp/lo # resource 7 : md 3 dev 6 flags -- tcp/lom_1 # resource 8 : md 4 dev 7 flags -- rc_verbs/bnxt_re0:1 # resource 9 : md 4 dev 7 flags -- ud_verbs/bnxt_re0:1 # resource 10 : md 5 dev 0 flags -- cma/memory # # memory: 0.00MB, file descriptors: 5 # create time: 0.960 ms # # # UCP worker 'rdma-dev-26:3220246' # # address: 362 bytes # # memory: 5.89MB, file descriptors: 17 # create time: 12.588 ms #
Note: "ucx_info -vdepw -u t" not reture/exit. It hang on.
After downgrade the ucx to rhel-8.5 in-box ucx build. I got this. [test@rdma-dev-26 ~]$ ucx_info -vdepw -u t # UCT version=1.10.1 revision 6a5856e # configured with: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-optimizations --disable-logging --disable-debug --disable-assertions --disable-params-check --without-java --enable-cma --without-cuda --without-gdrcopy --with-verbs --without-cm --without-knem --with-rdmacm --without-rocm --without-xpmem --without-ugni # # Memory domain: posix # Component: posix # allocate: unlimited # remote key: 24 bytes # rkey_ptr is supported # # Transport: posix # Device: memory # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 12179.00 MB/sec # latency: 80 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 100 # am_bcopy: <= 8256 # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 8 bytes # error handling: none # # # Memory domain: sysv # Component: sysv # allocate: unlimited # remote key: 12 bytes # rkey_ptr is supported # # Transport: sysv # Device: memory # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 12179.00 MB/sec # latency: 80 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 100 # am_bcopy: <= 8256 # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 8 bytes # error handling: none # # # Memory domain: self # Component: self # register: unlimited, cost: 0 nsec # remote key: 0 bytes # # Transport: self # Device: memory # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 6911.00 MB/sec # latency: 0 nsec # overhead: 10 nsec # put_short: <= 4294967295 # put_bcopy: unlimited # get_bcopy: unlimited # am_short: <= 8K # am_bcopy: <= 8K # domain: cpu # atomic_add: 32, 64 bit # atomic_and: 32, 64 bit # atomic_or: 32, 64 bit # atomic_xor: 32, 64 bit # atomic_fadd: 32, 64 bit # atomic_fand: 32, 64 bit # atomic_for: 32, 64 bit # atomic_fxor: 32, 64 bit # atomic_swap: 32, 64 bit # atomic_cswap: 32, 64 bit # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 0 bytes # iface address: 8 bytes # error handling: none # # # Memory domain: tcp # Component: tcp # register: unlimited, cost: 0 nsec # remote key: 0 bytes # # Transport: tcp # Device: bnxt_roce # System device: <unknown> # # capabilities: # bandwidth: 11818.05/ppn + 0.00 MB/sec # latency: 5206 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 16 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure # # Transport: tcp # Device: bnxt_roce.45 # System device: <unknown> # # capabilities: # bandwidth: 11818.05/ppn + 0.00 MB/sec # latency: 5206 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 16 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure # # Transport: tcp # Device: bnxt_roce.43 # System device: <unknown> # # capabilities: # bandwidth: 11818.05/ppn + 0.00 MB/sec # latency: 5206 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 1 # device num paths: 1 # max eps: 256 # device address: 16 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure # # Transport: tcp # Device: lom_1 # System device: <unknown> # # capabilities: # bandwidth: 113.16/ppn + 0.00 MB/sec # latency: 5776 nsec # overhead: 50000 nsec # put_zcopy: <= 18446744073709551590, up to 6 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 0 # am_short: <= 8K # am_bcopy: <= 8K # am_zcopy: <= 64K, up to 6 iov # am_opt_zcopy_align: <= 1 # am_align_mtu: <= 0 # am header: <= 8037 # connection: to ep, to iface # device priority: 0 # device num paths: 1 # max eps: 256 # device address: 16 bytes # iface address: 2 bytes # ep address: 10 bytes # error handling: peer failure # # # Connection manager: tcp # max_conn_priv: 2032 bytes # # Memory domain: sockcm # Component: sockcm # supports client-server connection establishment via sockaddr # < no supported devices found > # # Memory domain: bnxt_re0 # Component: ib # register: unlimited, cost: 180 nsec # remote key: 8 bytes # local memory handle is required for zcopy # # Transport: rc_verbs # Device: bnxt_re0:1 # System device: 0000:04:00.0 (0) # # capabilities: # bandwidth: 11664.63/ppn + 0.00 MB/sec # latency: 800 + 1.000 * N nsec # overhead: 75 nsec # put_short: <= 96 # put_bcopy: <= 8256 # put_zcopy: <= 1G, up to 3 iov # put_opt_zcopy_align: <= 512 # put_align_mtu: <= 4K # get_bcopy: <= 8256 # get_zcopy: 65..1G, up to 3 iov # get_opt_zcopy_align: <= 512 # get_align_mtu: <= 4K # am_short: <= 95 # am_bcopy: <= 8255 # am_zcopy: <= 8255, up to 2 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 127 # connection: to ep # device priority: 0 # device num paths: 1 # max eps: 256 # device address: 17 bytes # ep address: 16 bytes # error handling: peer failure # # # Transport: ud_verbs # Device: bnxt_re0:1 # System device: 0000:04:00.0 (0) # # capabilities: # bandwidth: 11664.63/ppn + 0.00 MB/sec # latency: 830 nsec # overhead: 105 nsec # am_short: <= 88 # am_bcopy: <= 4088 # am_zcopy: <= 4088, up to 5 iov # am_opt_zcopy_align: <= 512 # am_align_mtu: <= 4K # am header: <= 3952 # connection: to ep, to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 17 bytes # iface address: 3 bytes # ep address: 6 bytes # error handling: peer failure # # # Memory domain: bnxt_re1 # Component: ib # register: unlimited, cost: 180 nsec # remote key: 8 bytes # local memory handle is required for zcopy # < no supported devices found > # # Memory domain: rdmacm # Component: rdmacm # supports client-server connection establishment via sockaddr # < no supported devices found > # # Connection manager: rdmacm # max_conn_priv: 54 bytes # # Memory domain: cma # Component: cma # register: unlimited, cost: 9 nsec # # Transport: cma # Device: memory # System device: <unknown> # # capabilities: # bandwidth: 0.00/ppn + 11145.00 MB/sec # latency: 80 nsec # overhead: 400 nsec # put_zcopy: unlimited, up to 16 iov # put_opt_zcopy_align: <= 1 # put_align_mtu: <= 1 # get_zcopy: unlimited, up to 16 iov # get_opt_zcopy_align: <= 1 # get_align_mtu: <= 1 # connection: to iface # device priority: 0 # device num paths: 1 # max eps: inf # device address: 8 bytes # iface address: 4 bytes # error handling: none # # # UCP context # # component 0 : posix # component 1 : sysv # component 2 : self # component 3 : tcp # component 4 : sockcm # component 5 : ib # component 6 : rdmacm # component 7 : cma # # md 0 : component 0 posix # md 1 : component 1 sysv # md 2 : component 2 self # md 3 : component 3 tcp # md 4 : component 4 sockcm # md 5 : component 5 bnxt_re0 # md 6 : component 6 rdmacm # md 7 : component 7 cma # # resource 0 : md 0 dev 0 flags -- posix/memory # resource 1 : md 1 dev 0 flags -- sysv/memory # resource 2 : md 2 dev 0 flags -- self/memory # resource 3 : md 3 dev 1 flags -- tcp/bnxt_roce # resource 4 : md 3 dev 2 flags -- tcp/bnxt_roce.45 # resource 5 : md 3 dev 3 flags -- tcp/bnxt_roce.43 # resource 6 : md 3 dev 4 flags -- tcp/lom_1 # resource 7 : md 4 dev 5 flags -s sockcm/sockaddr # resource 8 : md 5 dev 6 flags -- rc_verbs/bnxt_re0:1 # resource 9 : md 5 dev 6 flags -- ud_verbs/bnxt_re0:1 # resource 10 : md 6 dev 5 flags -s rdmacm/sockaddr # resource 11 : md 7 dev 0 flags -- cma/memory # # memory: 0.00MB, file descriptors: 5 # create time: 0.993 ms # # # UCP worker 'rdma-dev-26:3220578' # # address: 361 bytes # # memory: 5.89MB, file descriptors: 16 # create time: 11.880 ms # # # UCP endpoint # # peer: <no debug data> # lane[0]: 2:self/memory.0 md[2] -> md[2]/self am am_bw#0 # lane[1]: 8:rc_verbs/bnxt_re0:1.0 md[5] -> md[5]/ib rma_bw#0 wireup{ud_verbs/bnxt_re0:1} # lane[2]: 11:cma/memory.0 md[7] -> md[7]/cma rma_bw#1 # # tag_send: 0..<egr/short>..8185..<egr/bcopy>..8192..<rndv>..(inf) # tag_send_nbr: 0..<egr/short>..8185..<egr/bcopy>..262144..<rndv>..(inf) # tag_send_sync: 0..<egr/short>..8185..<egr/bcopy>..8192..<rndv>..(inf) # # rma_bw: mds [5] rndv_rkey_size 18 # [1624537739.168547] [rdma-dev-26:3220578:0] uct_iface.c:67 UCX WARN got active message id 0, but no handler installed [1624537739.168564] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN payload 64 of 2818048 bytes: [1624537739.168564] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.168564] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.168564] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.168564] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.168880] [rdma-dev-26:3220578:0] log.c:488 UCX WARN ==== backtrace (tid:3220578) ==== [1624537739.168887] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 0 /lib64/libucs.so.0(ucs_log_print_backtrace+0x33) [0x14934916e823] [1624537739.168891] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 1 /lib64/libuct.so.0(+0x12c82) [0x1493493a3c82] [1624537739.168895] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 2 /lib64/ucx/libuct_ib.so.0(uct_ud_ep_process_rx+0x1f5) [0x149347d8f705] [1624537739.168898] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 3 /lib64/ucx/libuct_ib.so.0(+0x5345c) [0x149347d9345c] [1624537739.168901] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 4 /lib64/libucp.so.0(ucp_worker_progress+0x2a) [0x1493495f4bea] [1624537739.168905] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 5 ucx_info(+0x584a) [0x56133ab7e84a] [1624537739.168908] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 6 ucx_info(+0x4fb7) [0x56133ab7dfb7] [1624537739.168911] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 7 /lib64/libc.so.6(__libc_start_main+0xf3) [0x149347fd6493] [1624537739.168915] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 8 ucx_info(+0x50de) [0x56133ab7e0de] [1624537739.168918] [rdma-dev-26:3220578:0] log.c:493 UCX WARN ================================= [1624537739.185753] [rdma-dev-26:3220578:0] uct_iface.c:67 UCX WARN got active message id 0, but no handler installed [1624537739.185759] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN payload 64 of 2818048 bytes: [1624537739.185759] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.185759] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.185759] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.185759] [rdma-dev-26:3220578:0] uct_iface.c:70 UCX WARN 00000000:00000000:00000000:00000000 [1624537739.185824] [rdma-dev-26:3220578:0] log.c:488 UCX WARN ==== backtrace (tid:3220578) ==== [1624537739.185829] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 0 /lib64/libucs.so.0(ucs_log_print_backtrace+0x33) [0x14934916e823] [1624537739.185832] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 1 /lib64/libuct.so.0(+0x12c82) [0x1493493a3c82] [1624537739.185836] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 2 /lib64/ucx/libuct_ib.so.0(uct_ud_ep_process_rx+0x1f5) [0x149347d8f705] [1624537739.185839] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 3 /lib64/ucx/libuct_ib.so.0(+0x5345c) [0x149347d9345c] [1624537739.185842] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 4 /lib64/libucp.so.0(ucp_worker_progress+0x2a) [0x1493495f4bea] [1624537739.185846] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 5 ucx_info(+0x584a) [0x56133ab7e84a] [1624537739.185849] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 6 ucx_info(+0x4fb7) [0x56133ab7dfb7] [1624537739.185852] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 7 /lib64/libc.so.6(__libc_start_main+0xf3) [0x149347fd6493] [1624537739.185856] [rdma-dev-26:3220578:0] log.c:491 UCX WARN 8 ucx_info(+0x50de) [0x56133ab7e0de] [1624537739.185859] [rdma-dev-26:3220578:0] log.c:493 UCX WARN =================================
UCX currently does not support Broadcom RDMA devices. Because of this, OpenMPI does not pick up UCX by default and would try to fallback to other point-to-point ("pml") transport plugins. However, the parameter "-mca pml ucx" for mpirun prevents this fallback, which leads to "No components were able to be opened in the pml framework" error. So, need to remove "-mca pml ucx" from mpirun command
(In reply to Yossi Itigin from comment #7) > UCX currently does not support Broadcom RDMA devices. It is a regression issue as same parameters work for v4.1.0. BTW, this failure takes place on all HCAs - MLX4/5 IB/ROCE, BNXT ROCE, CXGB4 IW, QEDR IW... > Because of this, OpenMPI does not pick up UCX by default and would try to > fallback to other point-to-point ("pml") transport plugins. > However, the parameter "-mca pml ucx" for mpirun prevents this fallback, > which leads to "No components were able to be opened in the pml framework" > error. > > So, need to remove "-mca pml ucx" from mpirun command [root@rdma-dev-26 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re3:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' --mca pml_base_verbose 100 --mca osc_ucx_verbose 100 --mca pml_ucx_verbose 100 -x UCX_NET_DEVICES=bnxt_roce.45 -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: registering framework pml components [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: found loaded component v [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: component v register function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: found loaded component cm [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: component cm register function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: found loaded component monitoring [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: component monitoring register function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: found loaded component ob1 [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: component ob1 register function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: found loaded component ucx [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_register: component ucx register function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: opening pml components [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: found loaded component v [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: component v open function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: found loaded component cm [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: component cm closed [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: unloading component cm [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: found loaded component monitoring [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: component monitoring open function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: found loaded component ob1 [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: component ob1 open function successful [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: found loaded component ucx [rdma-dev-25.lab.bos.redhat.com:148092] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.10.1 [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: components_open: component ucx open function successful [rdma-dev-25.lab.bos.redhat.com:148092] select: component v not in the include list [rdma-dev-25.lab.bos.redhat.com:148092] select: component monitoring not in the include list [rdma-dev-25.lab.bos.redhat.com:148092] select: initializing pml component ob1 [rdma-dev-25.lab.bos.redhat.com:148092] select: init returned priority 20 [rdma-dev-25.lab.bos.redhat.com:148092] select: initializing pml component ucx [rdma-dev-25.lab.bos.redhat.com:148092] common_ucx.c:304 posix/memory: did not match transport list [rdma-dev-25.lab.bos.redhat.com:148092] common_ucx.c:304 sysv/memory: did not match transport list [rdma-dev-25.lab.bos.redhat.com:148092] common_ucx.c:304 self/memory: did not match transport list [rdma-dev-25.lab.bos.redhat.com:148092] common_ucx.c:304 tcp/bnxt_roce.45: did not match transport list [rdma-dev-25.lab.bos.redhat.com:148092] common_ucx.c:304 cma/memory: did not match transport list [rdma-dev-25.lab.bos.redhat.com:148092] common_ucx.c:311 support level is none [rdma-dev-25.lab.bos.redhat.com:148092] select: init returned failure for component ucx [rdma-dev-25.lab.bos.redhat.com:148092] selected ob1 best priority 20 [rdma-dev-25.lab.bos.redhat.com:148092] select: component ob1 selected [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: component v closed [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: unloading component v [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: component monitoring closed [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: unloading component monitoring [rdma-dev-25.lab.bos.redhat.com:148092] pml_ucx.c:268 mca_pml_ucx_close [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: component ucx closed [rdma-dev-25.lab.bos.redhat.com:148092] mca: base: close: unloading component ucx [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: registering framework pml components [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: found loaded component v [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: component v register function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: found loaded component cm [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: component cm register function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: found loaded component monitoring [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: component monitoring register function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: found loaded component ob1 [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: component ob1 register function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: found loaded component ucx [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_register: component ucx register function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: opening pml components [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: found loaded component v [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: component v open function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: found loaded component cm [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: component cm closed [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: unloading component cm [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: found loaded component monitoring [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: component monitoring open function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: found loaded component ob1 [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: component ob1 open function successful [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: found loaded component ucx [rdma-dev-26.lab.bos.redhat.com:3222664] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.10.1 [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: components_open: component ucx open function successful [rdma-dev-26.lab.bos.redhat.com:3222664] select: component v not in the include list [rdma-dev-26.lab.bos.redhat.com:3222664] select: component monitoring not in the include list [rdma-dev-26.lab.bos.redhat.com:3222664] select: initializing pml component ob1 [rdma-dev-26.lab.bos.redhat.com:3222664] select: init returned priority 20 [rdma-dev-26.lab.bos.redhat.com:3222664] select: initializing pml component ucx [rdma-dev-26.lab.bos.redhat.com:3222664] common_ucx.c:304 posix/memory: did not match transport list [rdma-dev-26.lab.bos.redhat.com:3222664] common_ucx.c:304 sysv/memory: did not match transport list [rdma-dev-26.lab.bos.redhat.com:3222664] common_ucx.c:304 self/memory: did not match transport list [rdma-dev-26.lab.bos.redhat.com:3222664] common_ucx.c:304 tcp/bnxt_roce.45: did not match transport list [rdma-dev-26.lab.bos.redhat.com:3222664] common_ucx.c:304 cma/memory: did not match transport list [rdma-dev-26.lab.bos.redhat.com:3222664] common_ucx.c:311 support level is none [rdma-dev-26.lab.bos.redhat.com:3222664] select: init returned failure for component ucx [rdma-dev-26.lab.bos.redhat.com:3222664] selected ob1 best priority 20 [rdma-dev-26.lab.bos.redhat.com:3222664] select: component ob1 selected [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: component v closed [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: unloading component v [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: component monitoring closed [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: unloading component monitoring [rdma-dev-26.lab.bos.redhat.com:3222664] pml_ucx.c:268 mca_pml_ucx_close [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: component ucx closed [rdma-dev-26.lab.bos.redhat.com:3222664] mca: base: close: unloading component ucx [rdma-dev-26.lab.bos.redhat.com:3222664] check:select: PML check not necessary on self [rdma-dev-25.lab.bos.redhat.com:148092] check:select: checking my pml ob1 against process [[58892,1],0] pml ob1 #---------------------------------------------------------------- # Intel(R) MPI Benchmarks 2021.2, MPI-1 part #---------------------------------------------------------------- # Date : Thu Jun 24 09:21:12 2021 # Machine : x86_64 # System : Linux # Release : 4.18.0-311.el8.kpq1.x86_64 # Version : #1 SMP Wed Jun 2 15:25:19 EDT 2021 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong rdma-dev-26.lab.bos.redhat.com:rank0: Process connect/disconnect error: 8, opcode 206 rdma-dev-25.lab.bos.redhat.com:rank1: Process connect/disconnect error: 8, opcode 206 [rdma-dev-26:3222664] *** Process received signal *** [rdma-dev-26:3222664] Signal: Aborted (6) [rdma-dev-26:3222664] Signal code: (-6) [rdma-dev-26:3222664] [ 0] /lib64/libpthread.so.0(+0x12b20)[0x1550e80c6b20] [rdma-dev-26:3222664] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x1550e7d2637f] [rdma-dev-26:3222664] [ 2] /lib64/libc.so.6(abort+0x127)[0x1550e7d10db5] [rdma-dev-26:3222664] [ 3] /lib64/libfabric.so.1(+0x1c1f2)[0x1550dc7411f2] [rdma-dev-26:3222664] [ 4] /lib64/libfabric.so.1(+0x14fa9a)[0x1550dc874a9a] [rdma-dev-26:3222664] [ 5] /lib64/libfabric.so.1(+0x1779cb)[0x1550dc89c9cb] [rdma-dev-26:3222664] [ 6] /lib64/libfabric.so.1(+0x1784b5)[0x1550dc89d4b5] [rdma-dev-26:3222664] [ 7] /lib64/libfabric.so.1(+0x178ba7)[0x1550dc89dba7] [rdma-dev-26:3222664] [ 8] /lib64/libfabric.so.1(+0x1799e2)[0x1550dc89e9e2] [rdma-dev-26:3222664] [ 9] /lib64/libfabric.so.1(+0x146796)[0x1550dc86b796] [rdma-dev-26:3222664] [10] /lib64/libfabric.so.1(+0x16fc96)[0x1550dc894c96] [rdma-dev-26:3222664] [11] /lib64/libfabric.so.1(+0x17a09e)[0x1550dc89f09e] [rdma-dev-26:3222664] [12] /lib64/libfabric.so.1(+0x14e20a)[0x1550dc87320a] [rdma-dev-26:3222664] [13] /lib64/libfabric.so.1(+0x12bac5)[0x1550dc850ac5] [rdma-dev-26:3222664] [14] /usr/lib64/openmpi/lib/openmpi/mca_btl_ofi.so(+0x42da)[0x1550dcd492da] [rdma-dev-26:3222664] [15] /usr/lib64/openmpi/lib/openmpi/mca_bml_r2.so(+0x27ea)[0x1550dcf517ea] [rdma-dev-26:3222664] [16] /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0xa80)[0x1550d1609b40] [rdma-dev-26:3222664] [17] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xb9)[0x1550e8cb70e9] [rdma-dev-26:3222664] [18] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_allgather_intra_two_procs+0x8d)[0x1550e8cb5ecd] [rdma-dev-26:3222664] [19] /usr/lib64/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_dec_fixed+0x4e)[0x1550cebc0f0e] [rdma-dev-26:3222664] [20] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_comm_split_with_info+0xcc)[0x1550e8c37a4c] [rdma-dev-26:3222664] [21] /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Comm_split+0x73)[0x1550e8c73be3] [rdma-dev-26:3222664] [22] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x4b45c)[0x5609cfd4c45c] [rdma-dev-26:3222664] [23] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x4ae7e)[0x5609cfd4be7e] [rdma-dev-26:3222664] [24] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x5aa49)[0x5609cfd5ba49] [rdma-dev-26:3222664] [25] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x15894)[0x5609cfd16894] [rdma-dev-26:3222664] [26] /lib64/libc.so.6(__libc_start_main+0xf3)[0x1550e7d12493] [rdma-dev-26:3222664] [27] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x1407e)[0x5609cfd1507e] [rdma-dev-26:3222664] *** End of error message *** [rdma-dev-25:148092] *** Process received signal *** [rdma-dev-25:148092] Signal: Aborted (6) [rdma-dev-25:148092] Signal code: (-6) [rdma-dev-25:148092] [ 0] /lib64/libpthread.so.0(+0x12b20)[0x15408b5afb20] [rdma-dev-25:148092] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x15408b20f37f] [rdma-dev-25:148092] [ 2] /lib64/libc.so.6(abort+0x127)[0x15408b1f9db5] [rdma-dev-25:148092] [ 3] /lib64/libfabric.so.1(+0x1c1f2)[0x15407bc001f2] [rdma-dev-25:148092] [ 4] /lib64/libfabric.so.1(+0x14fa9a)[0x15407bd33a9a] [rdma-dev-25:148092] [ 5] /lib64/libfabric.so.1(+0x1779cb)[0x15407bd5b9cb] [rdma-dev-25:148092] [ 6] /lib64/libfabric.so.1(+0x1784b5)[0x15407bd5c4b5] [rdma-dev-25:148092] [ 7] /lib64/libfabric.so.1(+0x178ba7)[0x15407bd5cba7] [rdma-dev-25:148092] [ 8] /lib64/libfabric.so.1(+0x1799e2)[0x15407bd5d9e2] [rdma-dev-25:148092] [ 9] /lib64/libfabric.so.1(+0x146796)[0x15407bd2a796] [rdma-dev-25:148092] [10] /lib64/libfabric.so.1(+0x16fc96)[0x15407bd53c96] [rdma-dev-25:148092] [11] /lib64/libfabric.so.1(+0x17a09e)[0x15407bd5e09e] [rdma-dev-25:148092] [12] /lib64/libfabric.so.1(+0x14e20a)[0x15407bd3220a] [rdma-dev-25:148092] [13] /lib64/libfabric.so.1(+0x12bac5)[0x15407bd0fac5] [rdma-dev-25:148092] [14] /usr/lib64/openmpi/lib/openmpi/mca_btl_ofi.so(+0x42da)[0x1540802192da] [rdma-dev-25:148092] [15] /usr/lib64/openmpi/lib/openmpi/mca_bml_r2.so(+0x27ea)[0x1540804217ea] [rdma-dev-25:148092] [16] /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0xa80)[0x154074afdb40] [rdma-dev-25:148092] [17] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xb9)[0x15408c1a00e9] [rdma-dev-25:148092] [18] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_allgather_intra_two_procs+0x8d)[0x15408c19eecd] [rdma-dev-25:148092] [19] /usr/lib64/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_dec_fixed+0x4e)[0x154072091f0e] [rdma-dev-25:148092] [20] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_comm_split_with_info+0xcc)[0x15408c120a4c] [rdma-dev-25:148092] [21] /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Comm_split+0x73)[0x15408c15cbe3] [rdma-dev-25:148092] [22] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x4b45c)[0x55ee1d43c45c] [rdma-dev-25:148092] [23] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x4ae7e)[0x55ee1d43be7e] [rdma-dev-25:148092] [24] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x5aa49)[0x55ee1d44ba49] [rdma-dev-25:148092] [25] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x15894)[0x55ee1d406894] [rdma-dev-25:148092] [26] /lib64/libc.so.6(__libc_start_main+0xf3)[0x15408b1fb493] [rdma-dev-25:148092] [27] /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x1407e)[0x55ee1d40507e] [rdma-dev-25:148092] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node rdma-dev-26 exited on signal 6 (Aborted). -------------------------------------------------------------------------- [root@rdma-dev-26 ~]$
Hi, > It is a regression issue as same parameters work for v4.1.0. It was a change in behavior decided by OpenMPI community, full discussion is here: https://github.com/open-mpi/ompi/issues/8489 Even if it may worked, UCX on Broadcom RDMA was never really supported, so it had to be turned off explicitly. > BTW, this failure takes place on all HCAs - MLX4/5 IB/ROCE, BNXT ROCE, CXGB4 IW, QEDR IW... Can you please send the output of `ibv_devinfo`, `ucx_info -vdepw -u t` and the failing mpirun command (with added -mca pml_ucx_verbose 100) on a machine with MLX4/5 IB/ROCE where it fails? This is not expected. > [rdma-dev-25:148092] Signal: Aborted (6) This error is from libfabric (which is taking over now that UCX is not force-selected) and not from UCX
[root@rdma-dev-22 ~]$ timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca btl_openib_allow_ib 1 -mca pml ucx -mca pml_ucx_verbose 100 --mca pml_base_verbose 100 --mca osc_ucx_verbose 100 -x UCX_NET_DEVICES=mlx5_ib0 /usr/lib64/openmpi/bin/mpitests-osu_bw [rdma-dev-21.lab.bos.redhat.com:493940] mca: base: components_register: registering framework pml components [rdma-dev-21.lab.bos.redhat.com:493940] mca: base: components_register: found loaded component ucx [rdma-dev-21.lab.bos.redhat.com:493940] mca: base: components_register: component ucx register function successful [rdma-dev-21.lab.bos.redhat.com:493940] mca: base: components_open: opening pml components [rdma-dev-21.lab.bos.redhat.com:493940] mca: base: components_open: found loaded component ucx [rdma-dev-21.lab.bos.redhat.com:493940] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.10.1 [rdma-dev-22.lab.bos.redhat.com:171069] mca: base: components_register: registering framework pml components [rdma-dev-22.lab.bos.redhat.com:171069] mca: base: components_register: found loaded component ucx [rdma-dev-22.lab.bos.redhat.com:171069] mca: base: components_register: component ucx register function successful [rdma-dev-22.lab.bos.redhat.com:171069] mca: base: components_open: opening pml components [rdma-dev-22.lab.bos.redhat.com:171069] mca: base: components_open: found loaded component ucx [rdma-dev-22.lab.bos.redhat.com:171069] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.10.1 [rdma-dev-21.lab.bos.redhat.com:493940] mca: base: components_open: component ucx open function successful [rdma-dev-22.lab.bos.redhat.com:171069] mca: base: components_open: component ucx open function successful [rdma-dev-21.lab.bos.redhat.com:493940] select: initializing pml component ucx [rdma-dev-21.lab.bos.redhat.com:493940] common_ucx.c:304 posix/memory: did not match transport list [rdma-dev-21.lab.bos.redhat.com:493940] common_ucx.c:304 sysv/memory: did not match transport list [rdma-dev-21.lab.bos.redhat.com:493940] common_ucx.c:304 self/memory: did not match transport list [rdma-dev-21.lab.bos.redhat.com:493940] common_ucx.c:304 tcp/mlx5_ib0: did not match transport list [rdma-dev-21.lab.bos.redhat.com:493940] common_ucx.c:304 cma/memory: did not match transport list [rdma-dev-21.lab.bos.redhat.com:493940] common_ucx.c:311 support level is none [rdma-dev-21.lab.bos.redhat.com:493940] select: init returned failure for component ucx -------------------------------------------------------------------------- No components were able to be opened in the pml framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: rdma-dev-21 Framework: pml -------------------------------------------------------------------------- [rdma-dev-21.lab.bos.redhat.com:493940] PML ucx cannot be selected [rdma-dev-22.lab.bos.redhat.com:171069] select: initializing pml component ucx [rdma-dev-22.lab.bos.redhat.com:171069] common_ucx.c:304 posix/memory: did not match transport list [rdma-dev-22.lab.bos.redhat.com:171069] common_ucx.c:304 sysv/memory: did not match transport list [rdma-dev-22.lab.bos.redhat.com:171069] common_ucx.c:304 self/memory: did not match transport list [rdma-dev-22.lab.bos.redhat.com:171069] common_ucx.c:304 tcp/mlx5_ib0: did not match transport list [rdma-dev-22.lab.bos.redhat.com:171069] common_ucx.c:304 cma/memory: did not match transport list [rdma-dev-22.lab.bos.redhat.com:171069] common_ucx.c:311 support level is none [rdma-dev-22.lab.bos.redhat.com:171069] select: init returned failure for component ucx [rdma-dev-22.lab.bos.redhat.com:171064] 1 more process has sent help message help-mca-base.txt / find-available:none found [rdma-dev-22.lab.bos.redhat.com:171064] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [root@rdma-dev-22 ~]$ [root@rdma-dev-22 ~]$ ibv_devinfo hca_id: mlx5_0 transport: InfiniBand (0) fw_ver: 12.28.1002 node_guid: 248a:0703:0056:b834 sys_image_guid: 248a:0703:0056:b834 vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2140110033 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet hca_id: mlx5_1 transport: InfiniBand (0) fw_ver: 12.28.1002 node_guid: 248a:0703:0049:d75c sys_image_guid: 248a:0703:0049:d75c vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2190110032 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 3 port_lid: 8 port_lmc: 0x00 link_layer: InfiniBand hca_id: mlx5_2 transport: InfiniBand (0) fw_ver: 12.28.1002 node_guid: 248a:0703:0049:d75d sys_image_guid: 248a:0703:0049:d75c vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2190110032 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 35 port_lmc: 0x00 link_layer: InfiniBand [root@rdma-dev-22 ~]$ [root@rdma-dev-22 ~]$ [root@rdma-dev-22 ~]$ ibv_devinfo -v hca_id: mlx5_0 transport: InfiniBand (0) fw_ver: 12.28.1002 node_guid: 248a:0703:0056:b834 sys_image_guid: 248a:0703:0056:b834 vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2140110033 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffffffffffff000 max_qp: 262144 max_qp_wr: 32768 device_cap_flags: 0xed721c36 BAD_PKEY_CNTR BAD_QKEY_CNTR AUTO_PATH_MIG CHANGE_PHY_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN MEM_WINDOW XRC MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B RAW_IP_CSUM MANAGED_FLOW_STEERING Unknown flags: 0xC8400000 max_sge: 30 max_sge_rd: 30 max_cq: 16777216 max_cqe: 4194303 max_mr: 16777216 max_pd: 16777216 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4194304 max_qp_init_rd_atom: 16 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 16777216 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 2097152 max_mcast_qp_attach: 240 max_total_mcast_qp_attach: 503316480 max_ah: 2147483647 max_fmr: 0 max_srq: 8388608 max_srq_wr: 32767 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 16 general_odp_caps: ODP_SUPPORT ODP_SUPPORT_IMPLICIT rc_odp_caps: SUPPORT_SEND SUPPORT_RECV SUPPORT_WRITE SUPPORT_READ SUPPORT_SRQ uc_odp_caps: NO SUPPORT ud_odp_caps: SUPPORT_SEND xrc_odp_caps: SUPPORT_SEND SUPPORT_WRITE SUPPORT_READ SUPPORT_SRQ completion timestamp_mask: 0x7fffffffffffffff hca_core_clock: 156250kHZ raw packet caps: C-VLAN stripping offload Scatter FCS offload IP csum offload Delay drop device_cap_flags_ex: 0x15ED721C36 RAW_SCATTER_FCS PCI_WRITE_END_PADDING Unknown flags: 0x100000000 tso_caps: max_tso: 262144 supported_qp: SUPPORT_RAW_PACKET rss_caps: max_rwq_indirection_tables: 65536 max_rwq_indirection_table_size: 2048 rx_hash_function: 0x1 rx_hash_fields_mask: 0x800000FF supported_qp: SUPPORT_RAW_PACKET max_wq_type_rq: 8388608 packet_pacing_caps: qp_rate_limit_min: 0kbps qp_rate_limit_max: 0kbps tag matching not supported cq moderation caps: max_cq_count: 65535 max_cq_period: 4095 us num_comp_vectors: 32 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet max_msg_sz: 0x40000000 port_cap_flags: 0x04010000 port_cap_flags2: 0x0000 max_vl_num: invalid value (0) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 1 gid_tbl_len: 256 subnet_timeout: 0 init_type_reply: 0 active_width: 4X (2) active_speed: 25.0 Gbps (32) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:268a:07ff:fe56:b834, RoCE v1 GID[ 1]: fe80::268a:7ff:fe56:b834, RoCE v2 GID[ 2]: fe80:0000:0000:0000:268a:07ff:fe56:b834, RoCE v1 GID[ 3]: fe80::268a:7ff:fe56:b834, RoCE v2 GID[ 4]: fe80:0000:0000:0000:268a:07ff:fe56:b834, RoCE v1 GID[ 5]: fe80::268a:7ff:fe56:b834, RoCE v2 GID[ 6]: 0000:0000:0000:0000:0000:ffff:ac1f:2d7a, RoCE v1 GID[ 7]: ::ffff:172.31.45.122, RoCE v2 GID[ 8]: 0000:0000:0000:0000:0000:ffff:ac1f:2b7a, RoCE v1 GID[ 9]: ::ffff:172.31.43.122, RoCE v2 GID[ 10]: 0000:0000:0000:0000:0000:ffff:ac1f:287a, RoCE v1 GID[ 11]: ::ffff:172.31.40.122, RoCE v2 hca_id: mlx5_1 transport: InfiniBand (0) fw_ver: 12.28.1002 node_guid: 248a:0703:0049:d75c sys_image_guid: 248a:0703:0049:d75c vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2190110032 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffffffffffff000 max_qp: 262144 max_qp_wr: 32768 device_cap_flags: 0xe97e1c36 BAD_PKEY_CNTR BAD_QKEY_CNTR AUTO_PATH_MIG CHANGE_PHY_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN MEM_WINDOW UD_IP_CSUM XRC MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B MANAGED_FLOW_STEERING Unknown flags: 0xC8480000 max_sge: 30 max_sge_rd: 30 max_cq: 16777216 max_cqe: 4194303 max_mr: 16777216 max_pd: 16777216 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4194304 max_qp_init_rd_atom: 16 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 16777216 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 2097152 max_mcast_qp_attach: 240 max_total_mcast_qp_attach: 503316480 max_ah: 2147483647 max_fmr: 0 max_srq: 8388608 max_srq_wr: 32767 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 16 general_odp_caps: ODP_SUPPORT ODP_SUPPORT_IMPLICIT rc_odp_caps: SUPPORT_SEND SUPPORT_RECV SUPPORT_WRITE SUPPORT_READ SUPPORT_SRQ uc_odp_caps: NO SUPPORT ud_odp_caps: SUPPORT_SEND xrc_odp_caps: SUPPORT_SEND SUPPORT_WRITE SUPPORT_READ SUPPORT_SRQ completion timestamp_mask: 0x7fffffffffffffff hca_core_clock: 156250kHZ device_cap_flags_ex: 0x11E97E1C36 PCI_WRITE_END_PADDING Unknown flags: 0x100000000 tso_caps: max_tso: 0 rss_caps: max_rwq_indirection_tables: 0 max_rwq_indirection_table_size: 0 rx_hash_function: 0x0 rx_hash_fields_mask: 0x0 max_wq_type_rq: 0 packet_pacing_caps: qp_rate_limit_min: 0kbps qp_rate_limit_max: 0kbps tag matching not supported cq moderation caps: max_cq_count: 65535 max_cq_period: 4095 us num_comp_vectors: 32 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 3 port_lid: 8 port_lmc: 0x00 link_layer: InfiniBand max_msg_sz: 0x40000000 port_cap_flags: 0x2259e848 port_cap_flags2: 0x0002 max_vl_num: 4 (3) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 8 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 14.0 Gbps (16) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:248a:0703:0049:d75c hca_id: mlx5_2 transport: InfiniBand (0) fw_ver: 12.28.1002 node_guid: 248a:0703:0049:d75d sys_image_guid: 248a:0703:0049:d75c vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2190110032 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffffffffffff000 max_qp: 262144 max_qp_wr: 32768 device_cap_flags: 0xe97e1c36 BAD_PKEY_CNTR BAD_QKEY_CNTR AUTO_PATH_MIG CHANGE_PHY_PORT PORT_ACTIVE_EVENT SYS_IMAGE_GUID RC_RNR_NAK_GEN MEM_WINDOW UD_IP_CSUM XRC MEM_MGT_EXTENSIONS MEM_WINDOW_TYPE_2B MANAGED_FLOW_STEERING Unknown flags: 0xC8480000 max_sge: 30 max_sge_rd: 30 max_cq: 16777216 max_cqe: 4194303 max_mr: 16777216 max_pd: 16777216 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4194304 max_qp_init_rd_atom: 16 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 16777216 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 2097152 max_mcast_qp_attach: 240 max_total_mcast_qp_attach: 503316480 max_ah: 2147483647 max_fmr: 0 max_srq: 8388608 max_srq_wr: 32767 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 16 general_odp_caps: ODP_SUPPORT ODP_SUPPORT_IMPLICIT rc_odp_caps: SUPPORT_SEND SUPPORT_RECV SUPPORT_WRITE SUPPORT_READ SUPPORT_SRQ uc_odp_caps: NO SUPPORT ud_odp_caps: SUPPORT_SEND xrc_odp_caps: SUPPORT_SEND SUPPORT_WRITE SUPPORT_READ SUPPORT_SRQ completion timestamp_mask: 0x7fffffffffffffff hca_core_clock: 156250kHZ device_cap_flags_ex: 0x11E97E1C36 PCI_WRITE_END_PADDING Unknown flags: 0x100000000 tso_caps: max_tso: 0 rss_caps: max_rwq_indirection_tables: 0 max_rwq_indirection_table_size: 0 rx_hash_function: 0x0 rx_hash_fields_mask: 0x0 max_wq_type_rq: 0 packet_pacing_caps: qp_rate_limit_min: 0kbps qp_rate_limit_max: 0kbps tag matching not supported cq moderation caps: max_cq_count: 65535 max_cq_period: 4095 us num_comp_vectors: 32 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 35 port_lmc: 0x00 link_layer: InfiniBand max_msg_sz: 0x40000000 port_cap_flags: 0x2259e848 port_cap_flags2: 0x0002 max_vl_num: 4 (3) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 8 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 14.0 Gbps (16) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0001:248a:0703:0049:d75d [root@rdma-dev-22 ~]$
mlx5_ib0 is not a vaild RDMA device for UCX. Please don't pass UCX_NET_DEVICES at all, or pass UCX_NET_DEVICES=mlx5_0:1 for example the full list of valid devices for UCX is printed by "ucx_info -d" Pls see more info in https://openucx.readthedocs.io/en/master/faq.html#which-network-devices-does-ucx-use
[root@rdma-dev-21 ~]$ timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca btl_openib_allow_ib 1 -mca pml ucx -x UCX_NET_DEVICES=mlx5_1:1 /usr/lib64/openmpi/bin/mpitests-osu_allreduce # OSU MPI Allreduce Latency Test v5.7.1 # Size Avg Latency(us) [rdma-dev-22:172216:0:172216] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x31322d) ==== backtrace (tid: 172216) ==== 0 /lib64/libucs.so.0(ucs_handle_error+0x2a4) [0x151b38f702a4] 1 /lib64/libucs.so.0(+0x2347c) [0x151b38f7047c] 2 /lib64/libucs.so.0(+0x2364a) [0x151b38f7064a] 3 /lib64/ucx/libuct_ib.so.0(+0x54890) [0x151b38d2e890] 4 /lib64/ucx/libuct_ib.so.0(+0x56b0f) [0x151b38d30b0f] 5 /lib64/libucp.so.0(ucp_worker_progress+0x2a) [0x151b3960fbea] 6 /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0x1bd) [0x151b3ea9a15d] 7 /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_barrier_intra_recursivedoubling+0xe9) [0x151b508e3629] 8 /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Barrier+0xb0) [0x151b5088eb70] 9 /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x1df9) [0x560f01ce6df9] 10 /lib64/libc.so.6(__libc_start_main+0xf3) [0x151b4f935493] 11 /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x22ae) [0x560f01ce72ae] ================================= [rdma-dev-22:172216] *** Process received signal *** [rdma-dev-22:172216] Signal: Segmentation fault (11) [rdma-dev-22:172216] Signal code: (-6) [rdma-dev-22:172216] Failing at address: 0x2a0b8 [rdma-dev-22:172216] [ 0] /lib64/libpthread.so.0(+0x12b20)[0x151b4fce9b20] [rdma-dev-22:172216] [ 1] /lib64/ucx/libuct_ib.so.0(+0x54890)[0x151b38d2e890] [rdma-dev-22:172216] [ 2] /lib64/ucx/libuct_ib.so.0(+0x56b0f)[0x151b38d30b0f] [rdma-dev-22:172216] [ 3] /lib64/libucp.so.0(ucp_worker_progress+0x2a)[0x151b3960fbea] [rdma-dev-22:172216] [ 4] /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0x1bd)[0x151b3ea9a15d] [rdma-dev-22:172216] [ 5] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_barrier_intra_recursivedoubling+0xe9)[0x151b508e3629] [rdma-dev-22:172216] [ 6] /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Barrier+0xb0)[0x151b5088eb70] [rdma-dev-22:172216] [ 7] /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x1df9)[0x560f01ce6df9] [rdma-dev-22:172216] [ 8] /lib64/libc.so.6(__libc_start_main+0xf3)[0x151b4f935493] [rdma-dev-22:172216] [ 9] /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x22ae)[0x560f01ce72ae] [rdma-dev-22:172216] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 172216 on node 172.31.0.122 exited on signal 11 (Segmentation fault). --------------------------------------------------------------------------
[root@rdma-dev-21 ~]$ timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca btl_openib_allow_ib 1 -mca pml ucx /usr/lib64/openmpi/bin/mpitests-osu_allreduce # OSU MPI Allreduce Latency Test v5.7.1 # Size Avg Latency(us) [rdma-dev-22:172405:0:172405] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace (tid: 172405) ==== 0 /lib64/libucs.so.0(ucs_handle_error+0x2a4) [0x146e0ae132a4] 1 /lib64/libucs.so.0(+0x2347c) [0x146e0ae1347c] 2 /lib64/libucs.so.0(+0x2364a) [0x146e0ae1364a] 3 /lib64/libucs.so.0(__ucs_twheel_sweep+0x60) [0x146e0ae1f380] 4 /lib64/ucx/libuct_ib.so.0(+0x56f31) [0x146e0abd3f31] 5 /lib64/libucp.so.0(ucp_worker_progress+0x2a) [0x146e0b4b2bea] 6 /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0x1bd) [0x146e14a0115d] 7 /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_barrier_intra_recursivedoubling+0xe9) [0x146e227a8629] 8 /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Barrier+0xb0) [0x146e22753b70] 9 /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x1df9) [0x556b92fd1df9] 10 /lib64/libc.so.6(__libc_start_main+0xf3) [0x146e217fa493] 11 /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x22ae) [0x556b92fd22ae] ================================= [rdma-dev-22:172405] *** Process received signal *** [rdma-dev-22:172405] Signal: Segmentation fault (11) [rdma-dev-22:172405] Signal code: (-6) [rdma-dev-22:172405] Failing at address: 0x2a175 [rdma-dev-22:172405] [ 0] /lib64/libpthread.so.0(+0x12b20)[0x146e21baeb20] [rdma-dev-22:172405] [ 1] /lib64/libucs.so.0(__ucs_twheel_sweep+0x60)[0x146e0ae1f380] [rdma-dev-22:172405] [ 2] /lib64/ucx/libuct_ib.so.0(+0x56f31)[0x146e0abd3f31] [rdma-dev-22:172405] [ 3] /lib64/libucp.so.0(ucp_worker_progress+0x2a)[0x146e0b4b2bea] [rdma-dev-22:172405] [ 4] /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_send+0x1bd)[0x146e14a0115d] [rdma-dev-22:172405] [ 5] /usr/lib64/openmpi/lib/libmpi.so.40(ompi_coll_base_barrier_intra_recursivedoubling+0xe9)[0x146e227a8629] [rdma-dev-22:172405] [ 6] /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Barrier+0xb0)[0x146e22753b70] [rdma-dev-22:172405] [ 7] /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x1df9)[0x556b92fd1df9] [rdma-dev-22:172405] [ 8] /lib64/libc.so.6(__libc_start_main+0xf3)[0x146e217fa493] [rdma-dev-22:172405] [ 9] /usr/lib64/openmpi/bin/mpitests-osu_allreduce(+0x22ae)[0x556b92fd22ae] [rdma-dev-22:172405] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 172405 on node 172.31.0.122 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- [root@rdma-dev-21 ~]$
is it possible to run this with "ulimit -c unlimited" and open core file with gdb and show the full backtrace (which exact source file line numbers etc)?
[root@rdma-dev-22 coredump]$ gdb /usr/lib64/openmpi/bin/mpitests-osu_allreduce core.mpitests-osu_al.0.4f32e83b8c424e729d74139af1d8fa7f.172216.1624544834000000 GNU gdb (GDB) Red Hat Enterprise Linux 8.2-15.el8 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib64/openmpi/bin/mpitests-osu_allreduce...Reading symbols from /usr/lib/debug/usr/lib64/openmpi/bin/mpitests-osu_allreduce-5.7-2.el8.x86_64.debug...done. done. warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: Can't open file (null) during file-backed mapping note processing warning: core file may not match specified executable file. [New LWP 172216] [New LWP 172217] [New LWP 172219] [New LWP 172218] [New LWP 172221] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/lib64/openmpi/bin/mpitests-osu_allreduce'. Program terminated with signal SIGSEGV, Segmentation fault. #0 ucs_mpool_get_inline (mp=0x560f0420a910) at /usr/src/debug/ucx-1.10.1-2.el8.x86_64/src/ucs/datastruct/mpool.inl:29 29 mp->freelist = elem->next; [Current thread is 1 (Thread 0x151b50f8d0c0 (LWP 172216))] (gdb) bt #0 ucs_mpool_get_inline (mp=0x560f0420a910) at /usr/src/debug/ucx-1.10.1-2.el8.x86_64/src/ucs/datastruct/mpool.inl:29 #1 uct_ud_mlx5_iface_post_recv (iface=iface@entry=0x560f0420a390) at ud/accel/ud_mlx5.c:186 #2 0x0000151b38d30b0f in uct_ud_mlx5_iface_poll_rx (is_async=0, iface=0x560f0420a390) at ud/accel/ud_mlx5.c:481 #3 uct_ud_mlx5_iface_progress (tl_iface=0x560f0420a390) at ud/accel/ud_mlx5.c:520 #4 0x0000151b3960fbea in ucs_callbackq_dispatch (cbq=<optimized out>) at /usr/src/debug/ucx-1.10.1-2.el8.x86_64/src/ucs/datastruct/callbackq.h:211 #5 uct_worker_progress (worker=<optimized out>) at /usr/src/debug/ucx-1.10.1-2.el8.x86_64/src/uct/api/uct.h:2435 #6 ucp_worker_progress (worker=0x560f03ed51e0) at core/ucp_worker.c:2405 #7 0x0000151b3ea9a15d in mca_pml_ucx_send_nbr (tag=<optimized out>, datatype=<optimized out>, count=<optimized out>, buf=0x0, ep=0x151b50dc7040) at pml_ucx.c:920 #8 mca_pml_ucx_send (buf=0x0, count=<optimized out>, datatype=<optimized out>, dst=<optimized out>, tag=<optimized out>, mode=<optimized out>, comm=0x151b50b44ba0 <ompi_mpi_comm_world>) at pml_ucx.c:941 #9 0x0000151b508e3629 in ompi_coll_base_sendrecv_zero (stag=-16, rtag=-16, comm=0x151b50b44ba0 <ompi_mpi_comm_world>, source=0, dest=0) at base/coll_base_barrier.c:60 #10 ompi_coll_base_barrier_intra_recursivedoubling (comm=0x151b50b44ba0 <ompi_mpi_comm_world>, module=<optimized out>) at base/coll_base_barrier.c:219 #11 0x0000151b5088eb70 in PMPI_Barrier (comm=0x151b50b44ba0 <ompi_mpi_comm_world>) at pbarrier.c:74 #12 PMPI_Barrier (comm=0x151b50b44ba0 <ompi_mpi_comm_world>) at pbarrier.c:40 #13 0x0000560f01ce6df9 in main (argc=<optimized out>, argv=<optimized out>) at osu_allreduce.c:103 (gdb)
Seems like there is a random data corruption, potentially some issue with RDMA drivers / DMA mapping (not directly related to UCX): - the segfault below happens because memory is overwritten in a "random" place - `ucx_info -u t -e` command sometimes fail/hang because UD packets get corrupted (real data send, but only 0-s received) - weird failures also happen with OpenMPI OpenIB BTL implementation: the command below fails (need to run several times): `mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_if_include mlx5_1:1 -mca btl 'openib,self,vader' -mca btl_openib_allow_ib 1 -mca pml ^ucx /usr/lib64/openmpi/bin/mpitests-osu_bw`
(In reply to Yossi Itigin from comment #16) > Seems like there is a random data corruption, potentially some issue with > RDMA drivers / DMA mapping (not directly related to UCX): There are two issues involved for openmpi failure over mlx5. 1) bad openmpi commit c36d7459b6331c4da825cad5a64326e7c1a272aa . After revert this commit, openmpi works over mlx4, qedr and opa. 2) mlx5 specific data corruption. > - the segfault below happens because memory is overwritten in a "random" > place > - `ucx_info -u t -e` command sometimes fail/hang because UD packets get > corrupted (real data send, but only 0-s received) > - weird failures also happen with OpenMPI OpenIB BTL implementation: the > command below fails (need to run several times): > `mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by > node -mca btl_openib_if_include mlx5_1:1 -mca btl 'openib,self,vader' -mca > btl_openib_allow_ib 1 -mca pml ^ucx /usr/lib64/openmpi/bin/mpitests-osu_bw` openib is obsoleted in openmpi upstream. I will revert commit c36d7459b6331c4da825cad5a64326e7c1a272aa and file another bug for mlx5 data corruption.
https://bugzilla.redhat.com/show_bug.cgi?id=1980171 for data corruption
A much better result with openmpi-4.1.1-2. Below is a sample of a recent test result with this version of openmpi. $ cat /etc/redhat-release Red Hat Enterprise Linux release 8.5 Beta (Ootpa) $ uname -r 4.18.0-323.el8.x86_64 $ rpm -qa | grep -E "rdma|openmpi|ucx" | grep -v "kernel-kernel" openmpi-4.1.1-2.el8.x86_64 rdma-core-35.0-1.el8.x86_64 librdmacm-35.0-1.el8.x86_64 ucx-1.10.1-2.el8.x86_64 rdma-core-devel-35.0-1.el8.x86_64 librdmacm-utils-35.0-1.el8.x86_64 mpitests-openmpi-5.7-2.el8.x86_64 openmpi-devel-4.1.1-2.el8.x86_64 $ Test results for mpi/openmpi on rdma-qe-25: 4.18.0-323.el8.x86_64, rdma-core-35.0-1.el8, bnxt_en, roce.45, & bnxt_re3 Result | Status | Test ---------+--------+------------------------------------ PASS | 0 | openmpi IMB-MPI1 PingPong mpirun one_core PASS | 0 | openmpi IMB-MPI1 PingPing mpirun one_core PASS | 0 | openmpi IMB-MPI1 Sendrecv mpirun one_core PASS | 0 | openmpi IMB-MPI1 Exchange mpirun one_core PASS | 0 | openmpi IMB-MPI1 Bcast mpirun one_core PASS | 0 | openmpi IMB-MPI1 Allgather mpirun one_core PASS | 0 | openmpi IMB-MPI1 Allgatherv mpirun one_core PASS | 0 | openmpi IMB-MPI1 Gather mpirun one_core PASS | 0 | openmpi IMB-MPI1 Gatherv mpirun one_core PASS | 0 | openmpi IMB-MPI1 Scatter mpirun one_core PASS | 0 | openmpi IMB-MPI1 Scatterv mpirun one_core PASS | 0 | openmpi IMB-MPI1 Alltoall mpirun one_core PASS | 0 | openmpi IMB-MPI1 Alltoallv mpirun one_core PASS | 0 | openmpi IMB-MPI1 Reduce mpirun one_core PASS | 0 | openmpi IMB-MPI1 Reduce_scatter mpirun one_core PASS | 0 | openmpi IMB-MPI1 Allreduce mpirun one_core PASS | 0 | openmpi IMB-MPI1 Barrier mpirun one_core PASS | 0 | openmpi IMB-IO S_Write_indv mpirun one_core PASS | 0 | openmpi IMB-IO S_Read_indv mpirun one_core PASS | 0 | openmpi IMB-IO S_Write_expl mpirun one_core PASS | 0 | openmpi IMB-IO S_Read_expl mpirun one_core PASS | 0 | openmpi IMB-IO P_Write_indv mpirun one_core PASS | 0 | openmpi IMB-IO P_Read_indv mpirun one_core PASS | 0 | openmpi IMB-IO P_Write_expl mpirun one_core PASS | 0 | openmpi IMB-IO P_Read_expl mpirun one_core PASS | 0 | openmpi IMB-IO P_Write_shared mpirun one_core PASS | 0 | openmpi IMB-IO P_Read_shared mpirun one_core PASS | 0 | openmpi IMB-IO P_Write_priv mpirun one_core PASS | 0 | openmpi IMB-IO P_Read_priv mpirun one_core PASS | 0 | openmpi IMB-IO C_Write_indv mpirun one_core PASS | 0 | openmpi IMB-IO C_Read_indv mpirun one_core PASS | 0 | openmpi IMB-IO C_Write_expl mpirun one_core PASS | 0 | openmpi IMB-IO C_Read_expl mpirun one_core PASS | 0 | openmpi IMB-IO C_Write_shared mpirun one_core PASS | 0 | openmpi IMB-IO C_Read_shared mpirun one_core PASS | 0 | openmpi IMB-EXT Window mpirun one_core PASS | 0 | openmpi IMB-EXT Unidir_Put mpirun one_core PASS | 0 | openmpi IMB-EXT Unidir_Get mpirun one_core PASS | 0 | openmpi IMB-EXT Bidir_Get mpirun one_core PASS | 0 | openmpi IMB-EXT Bidir_Put mpirun one_core PASS | 0 | openmpi IMB-EXT Accumulate mpirun one_core PASS | 0 | openmpi IMB-NBC Ibcast mpirun one_core PASS | 0 | openmpi IMB-NBC Iallgather mpirun one_core PASS | 0 | openmpi IMB-NBC Iallgatherv mpirun one_core PASS | 0 | openmpi IMB-NBC Igather mpirun one_core PASS | 0 | openmpi IMB-NBC Igatherv mpirun one_core PASS | 0 | openmpi IMB-NBC Iscatter mpirun one_core PASS | 0 | openmpi IMB-NBC Iscatterv mpirun one_core PASS | 0 | openmpi IMB-NBC Ialltoall mpirun one_core PASS | 0 | openmpi IMB-NBC Ialltoallv mpirun one_core PASS | 0 | openmpi IMB-NBC Ireduce mpirun one_core PASS | 0 | openmpi IMB-NBC Ireduce_scatter mpirun one_core PASS | 0 | openmpi IMB-NBC Iallreduce mpirun one_core PASS | 0 | openmpi IMB-NBC Ibarrier mpirun one_core PASS | 0 | openmpi IMB-RMA Unidir_put mpirun one_core PASS | 0 | openmpi IMB-RMA Unidir_get mpirun one_core PASS | 0 | openmpi IMB-RMA Bidir_put mpirun one_core PASS | 0 | openmpi IMB-RMA Bidir_get mpirun one_core PASS | 0 | openmpi IMB-RMA One_put_all mpirun one_core PASS | 0 | openmpi IMB-RMA One_get_all mpirun one_core PASS | 0 | openmpi IMB-RMA All_put_all mpirun one_core PASS | 0 | openmpi IMB-RMA All_get_all mpirun one_core PASS | 0 | openmpi IMB-RMA Put_local mpirun one_core PASS | 0 | openmpi IMB-RMA Put_all_local mpirun one_core PASS | 0 | openmpi IMB-RMA Exchange_put mpirun one_core PASS | 0 | openmpi IMB-RMA Exchange_get mpirun one_core PASS | 0 | openmpi IMB-RMA Accumulate mpirun one_core PASS | 0 | openmpi IMB-RMA Get_accumulate mpirun one_core PASS | 0 | openmpi IMB-RMA Fetch_and_op mpirun one_core PASS | 0 | openmpi IMB-RMA Compare_and_swap mpirun one_core PASS | 0 | openmpi IMB-RMA Get_local mpirun one_core PASS | 0 | openmpi IMB-RMA Get_all_local mpirun one_core PASS | 0 | openmpi OSU acc_latency mpirun one_core PASS | 0 | openmpi OSU allgather mpirun one_core PASS | 0 | openmpi OSU allgatherv mpirun one_core PASS | 0 | openmpi OSU allreduce mpirun one_core PASS | 0 | openmpi OSU alltoall mpirun one_core PASS | 0 | openmpi OSU alltoallv mpirun one_core PASS | 0 | openmpi OSU barrier mpirun one_core PASS | 0 | openmpi OSU bcast mpirun one_core PASS | 0 | openmpi OSU bibw mpirun one_core PASS | 0 | openmpi OSU bw mpirun one_core PASS | 0 | openmpi OSU cas_latency mpirun one_core PASS | 0 | openmpi OSU fop_latency mpirun one_core PASS | 0 | openmpi OSU gather mpirun one_core PASS | 0 | openmpi OSU gatherv mpirun one_core PASS | 0 | openmpi OSU get_acc_latency mpirun one_core PASS | 0 | openmpi OSU get_bw mpirun one_core PASS | 0 | openmpi OSU get_latency mpirun one_core PASS | 0 | openmpi OSU hello mpirun one_core PASS | 0 | openmpi OSU iallgather mpirun one_core PASS | 0 | openmpi OSU iallgatherv mpirun one_core PASS | 0 | openmpi OSU iallreduce mpirun one_core PASS | 0 | openmpi OSU ialltoall mpirun one_core PASS | 0 | openmpi OSU ialltoallv mpirun one_core PASS | 0 | openmpi OSU ialltoallw mpirun one_core PASS | 0 | openmpi OSU ibarrier mpirun one_core PASS | 0 | openmpi OSU ibcast mpirun one_core PASS | 0 | openmpi OSU igather mpirun one_core PASS | 0 | openmpi OSU igatherv mpirun one_core PASS | 0 | openmpi OSU init mpirun one_core PASS | 0 | openmpi OSU ireduce mpirun one_core PASS | 0 | openmpi OSU iscatter mpirun one_core PASS | 0 | openmpi OSU iscatterv mpirun one_core PASS | 0 | openmpi OSU latency mpirun one_core PASS | 0 | openmpi OSU latency_mp mpirun one_core PASS | 0 | openmpi OSU mbw_mr mpirun one_core PASS | 0 | openmpi OSU multi_lat mpirun one_core PASS | 0 | openmpi OSU put_bibw mpirun one_core PASS | 0 | openmpi OSU put_bw mpirun one_core PASS | 0 | openmpi OSU put_latency mpirun one_core PASS | 0 | openmpi OSU reduce mpirun one_core PASS | 0 | openmpi OSU reduce_scatter mpirun one_core PASS | 0 | openmpi OSU scatter mpirun one_core PASS | 0 | openmpi OSU scatterv mpirun one_core PASS | 0 | NON-ROOT IMB-MPI1 PingPong
The verification has been conducted as the following: 1. build & packages DISTRO=RHEL-8.5.0-20210721.n.0 + [21-07-22 09:45:51] cat /etc/redhat-release Red Hat Enterprise Linux release 8.5 Beta (Ootpa) + [21-07-22 09:45:51] uname -a Linux rdma-virt-00.lab.bos.redhat.com 4.18.0-323.el8.x86_64 #1 SMP Wed Jul 14 12:52:14 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux + [21-07-22 09:45:51] cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-323.el8.x86_64 root=UUID=5caaadea-6f56-4bcd-a27c-8f062f6a5665 ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=UUID=723711e4-2262-4b49-bb3b-8c9cffa5d35e console=ttyS1,115200n81 + [21-07-22 09:45:51] rpm -q rdma-core linux-firmware rdma-core-35.0-1.el8.x86_64 linux-firmware-20201218-102.git05789708.el8.noarch openmpi package: Installed: mpitests-openmpi-5.7-2.el8.x86_64 openmpi-4.1.1-2.el8.x86_64 openmpi-devel-4.1.1-2.el8.x86_64 2. HCAs tested OPENMPI : MLX4 IB0, MLX4 ROCE, MLX5 IB0, MLX5 ROCE, BNXT ROCE, QEDR IW, HFI OPA0 3. Results All of openmpi benchmarks passed on all MLX4 IB0, MLX4 ROCE, BXNT ROCE, QEDR IW devices However, some of the benchmarks failed on MLX5 IB/ROCE device - there will new buguialls filed for these issues So, it will be declared as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RDMA stack bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4412