Summary: | [RHEL-8.5][RDMA] update OSU Micro-Benchmarks 5.7.1 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Honggang LI <honli> |
Component: | mpitests | Assignee: | Honggang LI <honli> |
Status: | CLOSED ERRATA | QA Contact: | Brian Chae <bchae> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.5 | CC: | bchae, linville, rdma-dev-team |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | mpitests-5.7-2.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-09 19:41:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Bug Depends On: | 1971771 | ||
Bug Blocks: |
Description
Honggang LI
2021-05-13 01:19:59 UTC
ALL OPENMPI and MVAPICH2 benchmarks tested as part of RHEL8.5 CTC#1 test cycle. 1. build and packages DISTRO=RHEL-8.5.0-20210609.n.3 + [21-06-14 06:53:38] cat /etc/redhat-release Red Hat Enterprise Linux release 8.5 Beta (Ootpa) + [21-06-14 06:53:38] uname -a Linux rdma-dev-20.lab.bos.redhat.com 4.18.0-310.el8.x86_64 #1 SMP Thu May 27 14:56:02 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux + [21-06-14 06:53:38] cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-310.el8.x86_64 root=UUID=6754d3e0-4dac-4c9a-9f19-2c0d10402740 ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=UUID=d2356100-f7ee-485c-9db4-ddd3ac4bfc43 console=ttyS1,115200n81 + [21-06-14 06:53:38] rpm -q rdma-core linux-firmware rdma-core-35.0-1.el8.x86_64 linux-firmware-20201218-102.git05789708.el8.noarch MVAPICH2: ========= Installed: mpitests-mvapich2-5.7-2.el8.x86_64 mvapich2-2.3.6-1.el8.x86_64 OPENMPI ======= Installed: mpitests-openmpi-5.7-2.el8.x86_64 openmpi-4.1.1-1.el8.x86_64 openmpi-devel-4.1.1-1.el8.x86_64 2. Tested HCAs MLX4 IB, MLX4 ROCE, MLX5 IB, MLX5 ROCE, BNXT ROCE, CXGB4 IW, HFI OPA 3. Result With MVAPICH2 runs OSU benchmarks ran with the following Version # OSU MPI_Accumulate latency Test v5.7.1 However, with OPENMPI tests, all benchmarks failed with the following error messages, including the OSU benchmarks. [1623669085.731067] [rdma-virt-03:71896:0] ucp_context.c:1533 UCX WARN UCP version is incompatible, required: 1.10, actual: 1.9 (release 0 /lib64/libucp.so.0) -------------------------------------------------------------------------- No components were able to be opened in the pml framework. This typically means that either no components of this type were installed, or none of the installed components can be loaded. Sometimes this means that shared libraries required by these components are unable to be found/loaded. Host: rdma-virt-02 Framework: pml -------------------------------------------------------------------------- [rdma-virt-02.lab.bos.redhat.com:71157] PML ucx cannot be selected Refer to the following Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1971771 Even through OSU micro-benchmarks tested with MVAPICH2, all of them failed with message that seem to suggest problem with OPENMPI package, mpitests-5.7-2.el8. Setting this bugzilla back to Assigned state. (In reply to Brian Chae from comment #4) > > https://bugzilla.redhat.com/show_bug.cgi?id=1971771 > > > Even through OSU micro-benchmarks tested with MVAPICH2, all of them failed > with message that seem to suggest problem with OPENMPI package, > mpitests-5.7-2.el8. > Setting this bugzilla back to Assigned state. I will revert the bad commit for openmpi. And then please re-run mpitests. https://bugzilla.redhat.com/show_bug.cgi?id=1971771#c17 Partnerhttps://bugzilla.redhat.com/show_bug.cgi?id=1980171 (In reply to Honggang LI from comment #6) > (In reply to Brian Chae from comment #4) > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1971771 > > > > > > Even through OSU micro-benchmarks tested with MVAPICH2, all of them failed > > with message that seem to suggest problem with OPENMPI package, > > mpitests-5.7-2.el8. > > Setting this bugzilla back to Assigned state. > > I will revert the bad commit for openmpi. And then please re-run mpitests. > > https://bugzilla.redhat.com/show_bug.cgi?id=1971771#c17 I had revert the bad commit for openmpi. > Partnerhttps://bugzilla.redhat.com/show_bug.cgi?id=1980171 Alaa provided a scratch build for this bug. I confirmed it works for us. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=38162372 As all blocker issues of this bug had been addressed, I transfer this bug into 'ON_QA' status again. Sanity tests on MLX4 IB and MLX4 ROCE showed opempi tests were successful. Test results for sanity on rdma-virt-01: 4.18.0-323.el8.x86_64, rdma-core-35.0-1.el8, mlx4, ib0, & mlx4_0 Result | Status | Test ---------+--------+------------------------------------ PASS | 0 | load module mlx4_ib PASS | 0 | load module mlx4_en PASS | 0 | load module mlx4_core PASS | 0 | enable opensm PASS | 0 | restart opensm PASS | 0 | osmtest -f c -g 0xe41d2d03001d6791 PASS | 0 | ibstatus reported expected HCA rate PASS | 0 | pkey mlx4_ib0.8080 create/delete PASS | 0 | /usr/sbin/ibstat PASS | 0 | /usr/sbin/ibstatus PASS | 0 | systemctl start srp_daemon.service PASS | 0 | /usr/sbin/ibsrpdm -vc PASS | 0 | systemctl stop srp_daemon PASS | 0 | ping self - 172.31.0.201 PASS | 0 | ping6 self - fe80::e61d:2d03:1d:6791%mlx4_ib0 PASS | 0 | /usr/share/pmix/test/pmix_test PASS | 0 | ping server - 172.31.0.200 PASS | 0 | ping6 server - fe80::e61d:2d03:1d:6701%mlx4_ib0 PASS | 0 | openmpi mpitests-IMB-MPI1 PingPong PASS | 0 | openmpi mpitests-IMB-IO S_Read_indv PASS | 0 | openmpi mpitests-IMB-EXT Window PASS | 0 | openmpi mpitests-osu_get_bw PASS | 0 | ip multicast addr PASS | 0 | rping PASS | 0 | rcopy PASS | 0 | ib_read_bw PASS | 0 | ib_send_bw PASS | 0 | ib_write_bw PASS | 0 | iser login PASS | 0 | mount /dev/sdb /iser PASS | 0 | iser write 1K PASS | 0 | iser write 1M PASS | 0 | iser write 1G PASS | 0 | nfsordma mount - XFS_EXT PASS | 0 | nfsordma - wrote [5KB, 5MB, 5GB in 1KB, 1MB, 1GB bs] PASS | 0 | nfsordma umount - XFS_EXT PASS | 0 | nfsordma mount - RAMDISK PASS | 0 | nfsordma - wrote [5KB, 5MB, 5GB in 1KB, 1MB, 1GB bs] PASS | 0 | nfsordma umount - RAMDISK Checking for failures and known issues: no test failures Installed: mpitests-openmpi-5.7-2.el8.x86_64 openmpi-4.1.1-2.el8.x86_64 openmpi-devel-4.1.1-2.el8.x86_64 The verification has been conducted as the following: 1. build & packages DISTRO=RHEL-8.5.0-20210721.n.0 + [21-07-22 09:45:51] cat /etc/redhat-release Red Hat Enterprise Linux release 8.5 Beta (Ootpa) + [21-07-22 09:45:51] uname -a Linux rdma-virt-00.lab.bos.redhat.com 4.18.0-323.el8.x86_64 #1 SMP Wed Jul 14 12:52:14 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux + [21-07-22 09:45:51] cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-323.el8.x86_64 root=UUID=5caaadea-6f56-4bcd-a27c-8f062f6a5665 ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=UUID=723711e4-2262-4b49-bb3b-8c9cffa5d35e console=ttyS1,115200n81 + [21-07-22 09:45:51] rpm -q rdma-core linux-firmware rdma-core-35.0-1.el8.x86_64 linux-firmware-20201218-102.git05789708.el8.noarch openmpi package: Installed: mpitests-openmpi-5.7-2.el8.x86_64 openmpi-4.1.1-2.el8.x86_64 openmpi-devel-4.1.1-2.el8.x86_64 mvapich2 package: Installed: mpitests-mvapich2-5.7-2.el8.x86_64 mvapich2-2.3.6-1.el8.x86_64 2. HCAs tested OPENMPI : MLX4 IB0, MLX4 ROCE, MLX5 IB0, MLX5 ROCE, BNXT ROCE, QEDR IW, HFI OPA0 mvapich2 : ML4 IB0, MLX5 IB0 3. Results a. all of openmpi benchmarks passed on all MLX4 IB0, MLX4 ROCE, BXNT ROCE, QEDR IW devices some of the benchmarks failed on MLX5 IB/ROCE device - there will new buguialls filed for these issues b. all of mvapich2 benchmarks passed on MLX4 IB0; however, MLX4 IB on rdma-perf-00/01 had failures [ these are known issues from RHEL8.4 ] ALL OF mvalich2 benchmarks on MLX5 IB0 failed; these are also known issues with bz filed on RHEL8.4 So, it will be declared as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RDMA stack bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4412 |