Bug 1884358 - environment-modules-4.4.1-2.fc32.x86_64 causes all openmpi benchmarks failures due to module conflict
Summary: environment-modules-4.4.1-2.fc32.x86_64 causes all openmpi benchmarks failure...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: environment-modules
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Xavier Delaruelle
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-01 19:26 UTC by Brian Chae
Modified: 2020-11-24 01:22 UTC (History)
0 users

Fixed In Version: environment-modules-4.4.1-3.fc32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 01:22:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Client log for openmpi tests which failed due to environment-modules (278.48 KB, text/plain)
2020-10-01 19:26 UTC, Brian Chae
no flags Details

Description Brian Chae 2020-10-01 19:26:36 UTC
Created attachment 1718240 [details]
Client log for openmpi tests which failed due to environment-modules

Description of problem:

This is the exact same issue described in bz 1860674 for RHEL8.3 - please refer to that bugzilla for detailed info.

Version-Release number of selected component (if applicable):


DISTRO=Fedora-32
+ [20-10-01 14:29:49] cat /etc/redhat-release
Fedora release 32 (Thirty Two)
+ [20-10-01 14:29:49] uname -a
Linux rdma-qe-25.lab.bos.redhat.com 5.8.12-200.fc32.x86_64 #1 SMP Mon Sep 28 12:17:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+ [20-10-01 14:29:49] cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.8.12-200.fc32.x86_64 root=/dev/mapper/fedora_rdma--qe--25-root ro resume=/dev/mapper/fedora_rdma--qe--25-swap rd.lvm.lv=fedora_rdma-qe-25/root rd.lvm.lv=fedora_rdma-qe-25/swap console=ttyS0,115200n81
+ [20-10-01 14:29:49] rpm -q rdma-core linux-firmware
rdma-core-31.0-1.fc32.x86_64
linux-firmware-20200918-112.fc32.noarch
+ [20-10-01 14:29:49] tail /sys/class/infiniband/roceo1/fw_ver /sys/class/infiniband/roceo2/fw_ver /sys/class/infiniband/rocep94s0f0/fw_ver /sys/class/infiniband/rocep94s0f1/fw_ver
==> /sys/class/infiniband/roceo1/fw_ver <==
20.8.30.0

==> /sys/class/infiniband/roceo2/fw_ver <==
20.8.30.0

==> /sys/class/infiniband/rocep94s0f0/fw_ver <==
216.0.51.0

==> /sys/class/infiniband/rocep94s0f1/fw_ver <==
216.0.51.0
+ [20-10-01 14:29:49] lspci
+ [20-10-01 14:29:49] grep -i -e ethernet -e infiniband -e omni -e ConnectX
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
1a:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
1a:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/




+ [20-10-01 14:29:51] rpm -q environment-modules
package environment-modules is not installed
+ [20-10-01 14:29:51] yum install -y environment-modules
Last metadata expiration check: 0:13:13 ago on Thu 01 Oct 2020 02:16:39 PM EDT.
Dependencies resolved.
================================================================================
 Package               Arch     Version        Repository                  Size
================================================================================
Installing:
 environment-modules   x86_64   4.4.1-2.fc32   beaker-Fedora-Everything   347 k

Transaction Summary
================================================================================
Install  1 Package

Total download size: 347 k
Installed size: 1.6 M
Downloading Packages:
environment-modules-4.4.1-2.fc32.x86_64.rpm      46 MB/s | 347 kB     00:00    
--------------------------------------------------------------------------------
Total                                            34 MB/s | 347 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1 
  Installing       : environment-modules-4.4.1-2.fc32.x86_64                1/1 
  Running scriptlet: environment-modules-4.4.1-2.fc32.x86_64                1/1 
  Verifying        : environment-modules-4.4.1-2.fc32.x86_64                1/1 

Installed:
  environment-modules-4.4.1-2.fc32.x86_64                        



How reproducible:

100%


Steps to Reproduce:
1. make sure the path to "mpirun" exists

+ [20-10-01 14:30:13] which mpirun
/usr/lib64/openmpi/bin/mpirun

2. Have RDMA/MPI server and client hosts up and running with the above software packages

3. on the client hostsrun a OPENMPI benchmark using "mpirun", as the following:

timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include rocep94s0f1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca pml ucx -x UCX_NET_DEVICES=bnxt_roce.45 mpitests-IMB-MPI1 PingPong -time 1.5



Actual results:

+ [20-10-01 14:30:13] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include rocep94s0f1:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib' -mca pml ucx -x UCX_NET_DEVICES=bnxt_roce.45 mpitests-IMB-MPI1 PingPong -time 1.5
Loading /usr/share/modulefiles/mpi/openmpi-x86_64
  ERROR: /usr/share/modulefiles/mpi/openmpi-x86_64 cannot be loaded due to a
    conflict.
    HINT: Might try "module unload mpi" first.
bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------
+ [20-10-01 14:30:13] __MPI_check_result 127 mpitests-openmpi IMB-MPI1 PingPong mpirun /root/hfile_one_core


Expected results:

Similar to the following output:

+ [20-07-26 08:47:16] which mpirun
/usr/lib64/openmpi/bin/mpirun
+ [20-07-26 08:47:16] '[' 0 -ne 0 ']'
++ [20-07-26 08:47:16] cat imb_mpi.txt
+ [20-07-26 08:47:16] for app in $(cat imb_mpi.txt)
+ [20-07-26 08:47:16] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_2:1 -mca mtl '^psm2,psm,ofi' -mca btl openib,self -mca btl_openib_allow_ib 1 -x UCX_NET_DEVICES=mlx5_ib0 mpitests-IMB-MPI1 PingPong -time 1.5
#------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part
#------------------------------------------------------------
# Date                  : Sun Jul 26 08:47:16 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-221.el8.x86_64
# Version               : #1 SMP Thu Jun 25 20:58:19 UTC 2020
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# mpitests-IMB-MPI1 PingPong -time 1.5

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         9.66         0.00
            1         1000         9.77         0.10
            2         1000         9.93         0.20
            4         1000         9.74         0.41
            8         1000         9.45         0.85
           16         1000         9.59         1.67
           32         1000         9.78         3.27
           64         1000         9.97         6.42
          128         1000        10.06        12.72
          256         1000        10.13        25.27
          512         1000        10.38        49.34
         1024         1000        10.21       100.32
         2048         1000        18.77       109.12
         4096         1000        19.41       211.07
         8192         1000        21.46       381.79
        16384         1000        30.78       532.30
        32768         1000        44.40       737.99
        65536          640        96.45       679.45
       131072          320       120.11      1091.27
       262144          160       206.78      1267.72
       524288           80       284.22      1844.65
      1048576           40       450.47      2327.75
      2097152           20       701.05      2991.46
      4194304           10      1293.59      3242.36


# All processes entering MPI_Finalize

+ [20-07-26 08:47:21] __MPI_check_result 0 mpitests-openmpi IMB-MPI1 PingPong mpirun /root/hfile_one_core


Additional info:

Comment 1 Fedora Admin user for bugzilla script actions 2020-10-29 14:56:44 UTC
This package has changed maintainer in the Fedora.
Reassigning to the new maintainer of this component.

Comment 2 Fedora Admin user for bugzilla script actions 2020-11-02 14:55:10 UTC
This package has changed maintainer in the Fedora.
Reassigning to the new maintainer of this component.

Comment 3 Fedora Update System 2020-11-15 18:32:49 UTC
FEDORA-2020-38c6692ab1 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-38c6692ab1

Comment 4 Fedora Update System 2020-11-16 01:50:04 UTC
FEDORA-2020-38c6692ab1 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-38c6692ab1`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-38c6692ab1

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 5 Fedora Update System 2020-11-24 01:22:30 UTC
FEDORA-2020-38c6692ab1 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.