Created attachment 1718240 [details] Client log for openmpi tests which failed due to environment-modules Description of problem: This is the exact same issue described in bz 1860674 for RHEL8.3 - please refer to that bugzilla for detailed info. Version-Release number of selected component (if applicable): DISTRO=Fedora-32 + [20-10-01 14:29:49] cat /etc/redhat-release Fedora release 32 (Thirty Two) + [20-10-01 14:29:49] uname -a Linux rdma-qe-25.lab.bos.redhat.com 5.8.12-200.fc32.x86_64 #1 SMP Mon Sep 28 12:17:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux + [20-10-01 14:29:49] cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.8.12-200.fc32.x86_64 root=/dev/mapper/fedora_rdma--qe--25-root ro resume=/dev/mapper/fedora_rdma--qe--25-swap rd.lvm.lv=fedora_rdma-qe-25/root rd.lvm.lv=fedora_rdma-qe-25/swap console=ttyS0,115200n81 + [20-10-01 14:29:49] rpm -q rdma-core linux-firmware rdma-core-31.0-1.fc32.x86_64 linux-firmware-20200918-112.fc32.noarch + [20-10-01 14:29:49] tail /sys/class/infiniband/roceo1/fw_ver /sys/class/infiniband/roceo2/fw_ver /sys/class/infiniband/rocep94s0f0/fw_ver /sys/class/infiniband/rocep94s0f1/fw_ver ==> /sys/class/infiniband/roceo1/fw_ver <== 20.8.30.0 ==> /sys/class/infiniband/roceo2/fw_ver <== 20.8.30.0 ==> /sys/class/infiniband/rocep94s0f0/fw_ver <== 216.0.51.0 ==> /sys/class/infiniband/rocep94s0f1/fw_ver <== 216.0.51.0 + [20-10-01 14:29:49] lspci + [20-10-01 14:29:49] grep -i -e ethernet -e infiniband -e omni -e ConnectX 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 1a:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) 1a:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) 5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) 5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/ + [20-10-01 14:29:51] rpm -q environment-modules package environment-modules is not installed + [20-10-01 14:29:51] yum install -y environment-modules Last metadata expiration check: 0:13:13 ago on Thu 01 Oct 2020 02:16:39 PM EDT. Dependencies resolved. ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: environment-modules x86_64 4.4.1-2.fc32 beaker-Fedora-Everything 347 k Transaction Summary ================================================================================ Install 1 Package Total download size: 347 k Installed size: 1.6 M Downloading Packages: environment-modules-4.4.1-2.fc32.x86_64.rpm 46 MB/s | 347 kB 00:00 -------------------------------------------------------------------------------- Total 34 MB/s | 347 kB 00:00 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : environment-modules-4.4.1-2.fc32.x86_64 1/1 Running scriptlet: environment-modules-4.4.1-2.fc32.x86_64 1/1 Verifying : environment-modules-4.4.1-2.fc32.x86_64 1/1 Installed: environment-modules-4.4.1-2.fc32.x86_64 How reproducible: 100% Steps to Reproduce: 1. make sure the path to "mpirun" exists + [20-10-01 14:30:13] which mpirun /usr/lib64/openmpi/bin/mpirun 2. Have RDMA/MPI server and client hosts up and running with the above software packages 3. on the client hostsrun a OPENMPI benchmark using "mpirun", as the following: timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include rocep94s0f1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca pml ucx -x UCX_NET_DEVICES=bnxt_roce.45 mpitests-IMB-MPI1 PingPong -time 1.5 Actual results: + [20-10-01 14:30:13] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include rocep94s0f1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca pml ucx -x UCX_NET_DEVICES=bnxt_roce.45 mpitests-IMB-MPI1 PingPong -time 1.5 Loading /usr/share/modulefiles/mpi/openmpi-x86_64 ERROR: /usr/share/modulefiles/mpi/openmpi-x86_64 cannot be loaded due to a conflict. HINT: Might try "module unload mpi" first. bash: orted: command not found -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -------------------------------------------------------------------------- + [20-10-01 14:30:13] __MPI_check_result 127 mpitests-openmpi IMB-MPI1 PingPong mpirun /root/hfile_one_core Expected results: Similar to the following output: + [20-07-26 08:47:16] which mpirun /usr/lib64/openmpi/bin/mpirun + [20-07-26 08:47:16] '[' 0 -ne 0 ']' ++ [20-07-26 08:47:16] cat imb_mpi.txt + [20-07-26 08:47:16] for app in $(cat imb_mpi.txt) + [20-07-26 08:47:16] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_2:1 -mca mtl '^psm2,psm,ofi' -mca btl openib,self -mca btl_openib_allow_ib 1 -x UCX_NET_DEVICES=mlx5_ib0 mpitests-IMB-MPI1 PingPong -time 1.5 #------------------------------------------------------------ # Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part #------------------------------------------------------------ # Date : Sun Jul 26 08:47:16 2020 # Machine : x86_64 # System : Linux # Release : 4.18.0-221.el8.x86_64 # Version : #1 SMP Thu Jun 25 20:58:19 UTC 2020 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # mpitests-IMB-MPI1 PingPong -time 1.5 # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 9.66 0.00 1 1000 9.77 0.10 2 1000 9.93 0.20 4 1000 9.74 0.41 8 1000 9.45 0.85 16 1000 9.59 1.67 32 1000 9.78 3.27 64 1000 9.97 6.42 128 1000 10.06 12.72 256 1000 10.13 25.27 512 1000 10.38 49.34 1024 1000 10.21 100.32 2048 1000 18.77 109.12 4096 1000 19.41 211.07 8192 1000 21.46 381.79 16384 1000 30.78 532.30 32768 1000 44.40 737.99 65536 640 96.45 679.45 131072 320 120.11 1091.27 262144 160 206.78 1267.72 524288 80 284.22 1844.65 1048576 40 450.47 2327.75 2097152 20 701.05 2991.46 4194304 10 1293.59 3242.36 # All processes entering MPI_Finalize + [20-07-26 08:47:21] __MPI_check_result 0 mpitests-openmpi IMB-MPI1 PingPong mpirun /root/hfile_one_core Additional info:
This package has changed maintainer in the Fedora. Reassigning to the new maintainer of this component.
FEDORA-2020-38c6692ab1 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-38c6692ab1
FEDORA-2020-38c6692ab1 has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-38c6692ab1` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-38c6692ab1 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-38c6692ab1 has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report.