RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1860674 - [RHEL8.3] All OPENMPI benchmarks fail after upgrading to "environment-modules-4.5.1-1.el8.x86_64"
Summary: [RHEL8.3] All OPENMPI benchmarks fail after upgrading to "environment-modules...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: environment-modules
Version: 8.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Lukáš Nykrýn
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks: 1842946
TreeView+ depends on / blocked
 
Reported: 2020-07-26 13:29 UTC by Brian Chae
Modified: 2020-11-04 02:13 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-04 02:13:47 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
client log for openmpi showing all benchmark failures due to environment-modules-5.4.1 (622.71 KB, text/plain)
2020-07-26 13:29 UTC, Brian Chae
no flags Details
environment-modules fix (7.54 KB, patch)
2020-07-29 13:53 UTC, Xavier Delaruelle
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4593 0 None None None 2020-11-04 02:13:52 UTC

Description Brian Chae 2020-07-26 13:29:15 UTC
Created attachment 1702441 [details]
client log for openmpi showing all benchmark failures due to environment-modules-5.4.1

Description of problem:

All OPENMPI benchmarks fail with after upgrading to "environment-modules-4.5.1-1.el8.x86_64" package from "environment-modules-4.5.1-1.el8.x86_64", when the "mpirun" without the full path is used as the benchmark command.


workarounds:

1. Use the full path to "mpirun" command, instead: "/usr/lib64/openmpi/bin/mpirun", when "environment-modules-4.5.1-1.el8.x86_64" is loaded.

2. Or, load package, environment-modules-4.1.4-4.el8.x86_64 


Version-Release number of selected component (if applicable):


DISTRO=RHEL-8.3.0-20200701.2
+ [20-07-25 07:05:45] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 Beta (Ootpa)
+ [20-07-25 07:05:45] uname -a
Linux rdma-virt-03.lab.bos.redhat.com 4.18.0-221.el8.x86_64 #1 SMP Thu Jun 25 20:58:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+ [20-07-25 07:05:45] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-221.el8.x86_64 root=/dev/mapper/rhel_rdma--virt--03-root ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=/dev/mapper/rhel_rdma--virt--03-swap rd.lvm.lv=rhel_rdma-virt-03/root rd.lvm.lv=rhel_rdma-virt-03/swap console=ttyS1,115200n81
+ [20-07-25 07:05:45] rpm -q rdma-core linux-firmware
rdma-core-29.0-3.el8.x86_64
linux-firmware-20200619-99.git3890db36.el8.noarch
+ [20-07-25 07:05:45] tail /sys/class/infiniband/mlx5_0/fw_ver /sys/class/infiniband/mlx5_1/fw_ver /sys/class/infiniband/mlx5_bond_0/fw_ver
==> /sys/class/infiniband/mlx5_0/fw_ver <==
12.25.1020

==> /sys/class/infiniband/mlx5_1/fw_ver <==
12.25.1020

==> /sys/class/infiniband/mlx5_bond_0/fw_ver <==
14.27.1016
+ [20-07-25 07:05:45] lspci
+ [20-07-25 07:05:45] grep -i -e ethernet -e infiniband -e omni -e ConnectX
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]




+ [20-07-25 07:05:50] dnf install -y --setopt=strict=0 --nogpgcheck openmpi mpitests-openmpi environment-modules
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.

Last metadata expiration check: 0:00:19 ago on Sat 25 Jul 2020 07:05:31 AM EDT.
Package environment-modules-4.1.4-4.el8.x86_64 is already installed.
Dependencies resolved.
================================================================================
 Package                 Arch       Version          Repository            Size
================================================================================
Installing:
 mpitests-openmpi        x86_64     5.6.2-1.el8      beaker-AppStream     943 k
 openmpi                 x86_64     4.0.3-1.el8      beaker-AppStream     2.8 M
Upgrading:
 environment-modules     x86_64     4.5.1-1.el8      brew                 419 k

Transaction Summary
================================================================================
Install  2 Packages
Upgrade  1 Package

Total download size: 4.1 M
Downloading Packages:
(1/3): mpitests-openmpi-5.6.2-1.el8.x86_64.rpm   40 MB/s | 943 kB     00:00    
(2/3): environment-modules-4.5.1-1.el8.x86_64.r  12 MB/s | 419 kB     00:00    
(3/3): openmpi-4.0.3-1.el8.x86_64.rpm            40 MB/s | 2.8 MB     00:00    
--------------------------------------------------------------------------------
Total                                            59 MB/s | 4.1 MB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1 
  Upgrading        : environment-modules-4.5.1-1.el8.x86_64                 1/4 
  Running scriptlet: environment-modules-4.5.1-1.el8.x86_64                 1/4 
  Installing       : openmpi-4.0.3-1.el8.x86_64                             2/4 
  Installing       : mpitests-openmpi-5.6.2-1.el8.x86_64                    3/4 
  Cleanup          : environment-modules-4.1.4-4.el8.x86_64                 4/4 
  Running scriptlet: environment-modules-4.1.4-4.el8.x86_64                 4/4 
  Verifying        : mpitests-openmpi-5.6.2-1.el8.x86_64                    1/4 
  Verifying        : openmpi-4.0.3-1.el8.x86_64                             2/4 
  Verifying        : environment-modules-4.5.1-1.el8.x86_64                 3/4 
  Verifying        : environment-modules-4.1.4-4.el8.x86_64                 4/4 
Installed products updated.

Upgraded:
  environment-modules-4.5.1-1.el8.x86_64         <<<==============                               

Installed:
  mpitests-openmpi-5.6.2-1.el8.x86_64         openmpi-4.0.3-1.el8.x86_64        





How reproducible:

100%

Steps to Reproduce:
1. upgrade the package "environment-modules" from "environment-modules-4.1.4-4.el8.x86_64" to "environment-modules-4.5.1-1.el8.x86_64"

2. make sure the path to "mpirun" exists

+ [20-07-25 07:06:32] which mpirun
/usr/lib64/openmpi/bin/mpirun


3. run a OPENMPI benchmark using "mpirun", as the following:

imeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_0:1 -mca mtl '^psm2,psm,ofi' -mca btl openib,self -mca btl_openib_allow_ib 1 -x UCX_NET_DEVICES=mlx5_ib0 /usr/lib64/openmpi/bin/mpitests-osu_acc_latency


Actual results:


+ [20-07-25 07:06:05] which mpirun
/usr/lib64/openmpi/bin/mpirun
+ [20-07-25 07:06:05] '[' 0 -ne 0 ']'
++ [20-07-25 07:06:05] cat imb_mpi.txt
+ [20-07-25 07:06:05] for app in $(cat imb_mpi.txt)
+ [20-07-25 07:06:05] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_0:1 -mca mtl '^psm2,psm,ofi' -mca btl openib,self -mca btl_openib_allow_ib 1 -x UCX_NET_DEVICES=mlx5_ib0 mpitests-IMB-MPI1 PingPong -time 1.5
Loading /etc/modulefiles/mpi/openmpi-x86_64
  ERROR: /etc/modulefiles/mpi/openmpi-x86_64 cannot be loaded due to a conflict.
    HINT: Might try "module unload mpi" first.
bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------
+ [20-07-25 07:06:05] __MPI_check_result 127 mpitests-openmpi IMB-MPI1 PingPong mpirun /root/hfile_one_core




Expected results:

+ [20-07-26 08:47:16] which mpirun
/usr/lib64/openmpi/bin/mpirun
+ [20-07-26 08:47:16] '[' 0 -ne 0 ']'
++ [20-07-26 08:47:16] cat imb_mpi.txt
+ [20-07-26 08:47:16] for app in $(cat imb_mpi.txt)
+ [20-07-26 08:47:16] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include mlx5_2:1 -mca mtl '^psm2,psm,ofi' -mca btl openib,self -mca btl_openib_allow_ib 1 -x UCX_NET_DEVICES=mlx5_ib0 mpitests-IMB-MPI1 PingPong -time 1.5
#------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part
#------------------------------------------------------------
# Date                  : Sun Jul 26 08:47:16 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-221.el8.x86_64
# Version               : #1 SMP Thu Jun 25 20:58:19 UTC 2020
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# mpitests-IMB-MPI1 PingPong -time 1.5

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         9.66         0.00
            1         1000         9.77         0.10
            2         1000         9.93         0.20
            4         1000         9.74         0.41
            8         1000         9.45         0.85
           16         1000         9.59         1.67
           32         1000         9.78         3.27
           64         1000         9.97         6.42
          128         1000        10.06        12.72
          256         1000        10.13        25.27
          512         1000        10.38        49.34
         1024         1000        10.21       100.32
         2048         1000        18.77       109.12
         4096         1000        19.41       211.07
         8192         1000        21.46       381.79
        16384         1000        30.78       532.30
        32768         1000        44.40       737.99
        65536          640        96.45       679.45
       131072          320       120.11      1091.27
       262144          160       206.78      1267.72
       524288           80       284.22      1844.65
      1048576           40       450.47      2327.75
      2097152           20       701.05      2991.46
      4194304           10      1293.59      3242.36


# All processes entering MPI_Finalize

+ [20-07-26 08:47:21] __MPI_check_result 0 mpitests-openmpi IMB-MPI1 PingPong mpirun /root/hfile_one_core


Additional info:

This behavior started as soon as the RHEL8.3 is upgrade with "environment-modules-4.5.1-1.el8.x86_64". With "environment-modules-4.1.4-4.el8.x86_64" installed, no such issue exists.

Comment 1 Honggang LI 2020-07-27 02:38:55 UTC
Likely a regression of https://bugzilla.redhat.com/show_bug.cgi?id=1642837 .

@Jan, could you please have a look?

Comment 2 Xavier Delaruelle 2020-07-27 14:00:13 UTC
Newer versions of environment-modules (>=4.2) now ensure consistency of the loaded environment.

The following lines of logs:

Loading /etc/modulefiles/mpi/openmpi-x86_64
  ERROR: /etc/modulefiles/mpi/openmpi-x86_64 cannot be loaded due to a conflict.
    HINT: Might try "module unload mpi" first.

Seem to indicate that a "mpi" module is already loaded prior the attempt to load "/etc/modulefiles/mpi/openmpi-x86_64". As those modulefiles declare a conflict toward any other "mpi" module, an issue is raised. No error were raised on older version of environment-modules (<4.2) as conflict detection was incomplete.

I would suggest to look at user environment right before test is launched and add a "module unload mpi" (or a "module purge") right before the "module load /etc/modulefiles/mpi/openmpi-x86_64" command.

Comment 3 Honggang LI 2020-07-28 03:20:21 UTC
(In reply to Xavier Delaruelle from comment #2)

> I would suggest to look at user environment right before test is launched
> and add a "module unload mpi" (or a "module purge") right before the "module
> load /etc/modulefiles/mpi/openmpi-x86_64" command.

Yes, 'module purge' was executed before run 'module load /etc/modulefiles/mpi/openmpi-x86_64'.
Please see the attachment for details.

https://bugzilla.redhat.com/attachment.cgi?id=1702441

Comment 4 Xavier Delaruelle 2020-07-29 13:53:18 UTC
Created attachment 1702816 [details]
environment-modules fix

Thanks for the clarification.

This is clearly a bug on the environment-modules side.

I have just made a fix for it (see the patch attached). It could be applied right away on the SRPM if you want to quickly build a fixed version of the environment-modules package. I will release upstream a v4.5.2 in the next days, that will include this fix.

Comment 5 Honggang LI 2020-07-29 15:04:53 UTC
(In reply to Xavier Delaruelle from comment #4)
> Created attachment 1702816 [details]
> environment-modules fix

Confirmed this patch works for me. Thank you!

Comment 6 Afom T. Michael 2020-08-17 15:26:27 UTC
(In reply to Honggang LI from comment #5)
> (In reply to Xavier Delaruelle from comment #4)
> > Created attachment 1702816 [details]
> > environment-modules fix
> 
> Confirmed this patch works for me. Thank you!

I also tested with the stated patch to /usr/share/Modules/libexec/modulecmd.tcl and openmpi tests passed.

Comment 14 errata-xmlrpc 2020-11-04 02:13:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (environment-modules bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4593


Note You need to log in before you can comment on or make changes to this bug.