Description of problem: Some of the OPENMPI benchmarks time-out with RC1 when run on CXGB4 devices. The failed benchmarks are as the following: FAIL | 1 | openmpi IMB-IO P_Write_indv mpirun one_core FAIL | 1 | openmpi IMB-IO P_Write_expl mpirun one_core FAIL | 1 | openmpi IMB-IO P_Write_shared mpirun one_core FAIL | 1 | openmpi IMB-IO P_Write_priv mpirun one_core FAIL | 1 | openmpi IMB-IO C_Write_indv mpirun one_core FAIL | 1 | openmpi IMB-IO C_Write_expl mpirun one_core FAIL | 1 | openmpi IMB-IO C_Write_shared mpirun one_core FAIL | 1 | openmpi OSU get_acc_latency mpirun one_core FAIL | 1 | openmpi OSU mbw_mr mpirun one_core This issue seems to be consistent in the following hosts. a. rdma-qe-12 (cxgb4 t5 iw 40) / rdma-perf-06 (cxgb4 T6 iw 100) beaker job : https://beaker.engineering.redhat.com/jobs/7293293 b. rdma-dev-13 (cxgb4 t6 iw 100) / rdma-perf-06 (cxgb4 T6 iw 100) https://beaker.engineering.redhat.com/jobs/7292260 Version-Release number of selected component (if applicable): Clients: rdma-perf-06 Servers: rdma-qe-12 DISTRO=RHEL-9.2.0-20221129.2 + [22-11-30 18:38:52] cat /etc/redhat-release Red Hat Enterprise Linux release 9.2 Beta (Plow) + [22-11-30 18:38:52] uname -a Linux rdma-perf-06.rdma.lab.eng.rdu2.redhat.com 5.14.0-202.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov 28 08:49:47 EST 2022 x86_64 x86_64 x86_64 GNU/Linux + [22-11-30 18:38:52] cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-202.el9.x86_64 root=UUID=60790874-ea0a-4a35-8447-d83f2475913b ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=08d83c36-2fab-45c6-a375-8bb16849b90a console=ttyS0,115200n81 + [22-11-30 18:38:52] rpm -q rdma-core linux-firmware rdma-core-41.0-3.el9.x86_64 linux-firmware-20221012-128.el9.noarch + [22-11-30 18:38:52] tail /sys/class/infiniband/cxgb4_0/fw_ver /sys/class/infiniband/hfi1_0/fw_ver /sys/class/infiniband/mlx5_0/fw_ver /sys/class/infiniband/mlx5_1/fw_ver /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver ==> /sys/class/infiniband/cxgb4_0/fw_ver <== 1.27.0.0 ==> /sys/class/infiniband/hfi1_0/fw_ver <== 1.27.0 ==> /sys/class/infiniband/mlx5_0/fw_ver <== 20.99.5392 ==> /sys/class/infiniband/mlx5_1/fw_ver <== 20.99.5392 ==> /sys/class/infiniband/qedr0/fw_ver <== 8.59.1.0 ==> /sys/class/infiniband/qedr1/fw_ver <== 8.59.1.0 + [22-11-30 18:38:52] lspci + [22-11-30 18:38:52] grep -i -e ethernet -e infiniband -e omni -e ConnectX 19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02) 19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02) 19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02) 19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02) 5e:00.0 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller 5e:00.1 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller 5e:00.2 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller 5e:00.3 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller 5e:00.4 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller af:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] af:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] d8:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 11) How reproducible: 100% in the above combinations of RDMA hosts Steps to Reproduce: 1. Please refer to the beaker job outputs in client hosts mentioned above. 2. 3. Actual results: Expected results: Additional info: However, with the following CXGB4 hosts combinations, ALL OPENMPI benchmarks PASSED a. mpi suite over rdma-iw-cxgb pool[ RHEL-9.2.0-20221129.2: rdma-perf-06/07 - mpich2,openmpi ] beaker job : https://beaker.engineering.redhat.com/jobs/7291986 b. mpi suite over rdma-iw-cxgb pool[ RHEL-9.2.0-20221129.2: rdma-dev-13/rdma-qe-12 - mpich2,openmpi ] - J:7293324 mpi/openmpi test results on rdma-dev-13/rdma-qe-12 & Beaker job J:7293324: 5.14.0-202.el9.x86_64, rdma-core-41.0-3.el9, cxgb4, iw, T520-CR & cxgb4_0 Result | Status | Test ---------+--------+------------------------------------ Checking for failures and known issues: no test failures beaker job : https://beaker.engineering.redhat.com/jobs/7293324
I do not see how this is related to the opensc package. Was the component wrongly selected and should this be reported against openmpi? I will reassign. If I am wrong, please find the right component.
*** Bug 2149874 has been marked as a duplicate of this bug. ***
*** Bug 2149878 has been marked as a duplicate of this bug. ***