Description of problem: The following OPENMPI benchmarks fail in QEDE IW device consistently. FAIL | 1 | openmpi OSU acc_latency mpirun one_core FAIL | 1 | openmpi OSU get_acc_latency mpirun one_core Version-Release number of selected component (if applicable): DISTRO=RHEL-9.0.0-20220313.2 + [22-03-14 13:15:47] cat /etc/redhat-release Red Hat Enterprise Linux release 9.0 Beta (Plow) + [22-03-14 13:15:47] uname -a Linux rdma-dev-03.rdma.lab.eng.rdu2.redhat.com 5.14.0-70.1.1.el9.x86_64 #1 SMP PREEMPT Tue Mar 8 22:22:02 EST 2022 x86_64 x86_64 x86_64 GNU/Linux + [22-03-14 13:15:47] cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-70.1.1.el9.x86_64 root=UUID=2076c1cf-ae89-4a0a-be94-8b47702b363e ro console=tty0 rd_NO_PLYMOUTH intel_iommu=on iommu=on crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=d37d3c68-e4f7-4218-84fd-c3feacdef6fa console=ttyS1,115200 + [22-03-14 13:15:47] rpm -q rdma-core linux-firmware rdma-core-37.2-1.el9.x86_64 linux-firmware-20220209-125.el9.noarch + [22-03-14 13:15:47] tail /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver ==> /sys/class/infiniband/qedr0/fw_ver <== 8.42.2.0 ==> /sys/class/infiniband/qedr1/fw_ver <== 8.42.2.0 + [22-03-14 13:15:47] lspci + [22-03-14 13:15:47] grep -i -e ethernet -e infiniband -e omni -e ConnectX 02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 08:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10) 08:00.1 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10) How reproducible: 100% Steps to Reproduce: 1. bring up the RDMA hosts mentioned above with RHEL9.0 build 2. set up RDMA hosts for openmpi benchamrk tests 3. run the failed benchmarks test commands as the following on the client: a) timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include qedr1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' --mca mtl_base_verbose 100 --mca btl_openib_verbose 100 -mca pml ucx -mca osc ucx -x UCX_NET_DEVICES=qede_iw --mca osc_ucx_verbose 100 --mca pml_ucx_verbose 100 /usr/lib64/openmpi/bin/mpitests-osu_acc_latency b) timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include qedr1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' --mca mtl_base_verbose 100 --mca btl_openib_verbose 100 -mca pml ucx -mca osc ucx -x UCX_NET_DEVICES=qede_iw --mca osc_ucx_verbose 100 --mca pml_ucx_verbose 100 /usr/lib64/openmpi/bin/mpitests-osu_get_acc_latency Actual results: a) + [22-03-14 13:54:10] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include qedr1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' --mca mtl_base_verbose 100 --mca btl_openib_verbose 100 -mca pml ucx -mca osc ucx -x UCX_NET_DEVICES=qede_iw --mca osc_ucx_verbose 100 --mca pml_ucx_verbose 100 /usr/lib64/openmpi/bin/mpitests-osu_acc_latency [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.11.2 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:289 mca_pml_ucx_init [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:114 Pack remote worker address, size 38 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:114 Pack local worker address, size 141 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:351 created ucp context 0x55ce8e5bbd80, worker 0x55ce8e644cb0 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.11.2 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:289 mca_pml_ucx_init [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:114 Pack remote worker address, size 38 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:114 Pack local worker address, size 141 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:351 created ucp context 0x55d9370c6c30, worker 0x55d93714fb60 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:182 Got proc 0 address, size 141 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:411 connecting to proc. 0 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:182 Got proc 1 address, size 141 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:411 connecting to proc. 1 # OSU MPI_Accumulate latency Test v5.8 # Window creation: MPI_Win_allocate # Synchronization: MPI_Win_flush # Size Latency (us) [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:182 Got proc 0 address, size 38 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:53906] pml_ucx.c:411 connecting to proc. 0 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:182 Got proc 1 address, size 38 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:57556] pml_ucx.c:411 connecting to proc. 1 1 3825.37 2 3825.72 4 3825.25 8 3825.56 + [22-03-14 13:57:14] __MPI_check_result 1 mpitests-openmpi OSU /usr/lib64/openmpi/bin/mpitests-osu_acc_latency mpirun /root/hfile_one_core b) + [22-03-14 14:00:32] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include qedr1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' --mca mtl_base_verbose 100 --mca btl_openib_verbose 100 -mca pml ucx -mca osc ucx -x UCX_NET_DEVICES=qede_iw --mca osc_ucx_verbose 100 --mca pml_ucx_verbose 100 /usr/lib64/openmpi/bin/mpitests-osu_get_acc_latency [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 22 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [create_qp:2752]create qp: failed on ibv_cmd_create_qp with 95 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.11.2 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:197 mca_pml_ucx_open: UCX version 1.11.2 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:289 mca_pml_ucx_init [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:114 Pack remote worker address, size 38 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:114 Pack local worker address, size 141 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:351 created ucp context 0x562449807d90, worker 0x562449890cc0 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:289 mca_pml_ucx_init [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:114 Pack remote worker address, size 38 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:114 Pack local worker address, size 141 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:351 created ucp context 0x5557b6d21c30, worker 0x5557b6daab60 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:182 Got proc 0 address, size 141 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:411 connecting to proc. 0 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:182 Got proc 1 address, size 141 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:411 connecting to proc. 1 # OSU MPI_Get_accumulate latency Test v5.8 # Window creation: MPI_Win_create # Synchronization: MPI_Win_lock/unlock # Size Latency (us) [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:182 Got proc 0 address, size 38 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:54393] pml_ucx.c:411 connecting to proc. 0 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:182 Got proc 1 address, size 38 [rdma-dev-03.rdma.lab.eng.rdu2.redhat.com:58268] pml_ucx.c:411 connecting to proc. 1 1 4933.29 2 4934.98 4 4939.33 mpirun: Forwarding signal 18 to job + [22-03-14 14:03:36] __MPI_check_result 1 mpitests-openmpi OSU /usr/lib64/openmpi/bin/mpitests-osu_get_acc_latency mpirun /root/hfile_one_core Expected results: Both benchmarks to complete normally with all benchark stats. Additional info: