RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1744780 - ucx-enabled openmpi test with mpirun fails causing Segmentation fault
Summary: ucx-enabled openmpi test with mpirun fails causing Segmentation fault
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: ucx
Version: 8.1
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 8.2
Assignee: Jonathan Toppins
QA Contact: Afom T. Michael
URL:
Whiteboard:
Depends On:
Blocks: 1708794
TreeView+ depends on / blocked
 
Reported: 2019-08-22 21:09 UTC by Afom T. Michael
Modified: 2022-05-06 03:02 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 15:34:54 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-26251 0 None None None 2022-05-06 03:02:07 UTC
Red Hat Product Errata RHBA-2020:1590 0 None None None 2020-04-28 15:36:53 UTC

Description Afom T. Michael 2019-08-22 21:09:55 UTC
Description of problem:
Running RHEL-8.1.0 Snapshot-2, sanity test of openmpi fails with "Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x153918978768)". So far, I saw this on hosts with hfi1 & bnxt_re HCAs.


Version-Release number of selected component (if applicable):
DISTRO=RHEL-8.1.0-20190820.3
4.18.0-135.el8.x86_64
mpitests-openmpi-5.4.2-4.el8.x86_64
openmpi-4.0.1-3.el8.x86_64
rdma-core-22.3-1.el8.x86_64
$ lspci | grep -i -e ethernet -e infiniband -e omni
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
04:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57454 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb Ethernet (rev 01)
$ ibstat
CA 'bnxt_re0'
	CA type: Broadcom NetXtreme-C/E RoCE Driver HCA
	Number of ports: 1
	Firmware version: 212.0.106.0
	Hardware version: 0x14e4
	Node GUID: 0x020af7fffeeacd90
	System image GUID: 0x020af7fffeeacd90
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x001d0000
		Port GUID: 0x020af7fffeeacd90
		Link layer: Ethernet
$ ibstatus
Infiniband device 'bnxt_re0' port 1 status:
	default gid:	 fe80:0000:0000:0000:020a:f7ff:feea:cd90
	base lid:	 0x0
	sm lid:		 0x0
	state:		 4: ACTIVE
	phys state:	 5: LinkUp
	rate:		 100 Gb/sec (4X EDR)
	link_layer:	 Ethernet

$ ip addr show 
[...snip...]
3: bnxt_roce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 00:0a:f7:ea:cd:90 brd ff:ff:ff:ff:ff:ff
    inet 172.31.40.126/24 brd 172.31.40.255 scope global dynamic noprefixroute bnxt_roce
       valid_lft 3209sec preferred_lft 3209sec
    inet6 fe80::20a:f7ff:feea:cd90/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[...snip...]
$


How reproducible:
Always

Steps to Reproduce:
1. Execute /usr/lib64/openmpi/bin/mpirun with the arguments shown below on hosts with bnxt_re or hfi1 HCA. Or just run our sanity test script.
2.
3.

Actual results:
timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong
[rdma-dev-25:24279:0:24279] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x153918978768)
==== backtrace ====
    0  /lib64/libucs.so.0(+0x18bb0) [0x15391830bbb0]
    1  /lib64/libucs.so.0(+0x18d8a) [0x15391830bd8a]
    2  /lib64/libuct.so.0(+0x1655b) [0x15391354f55b]
    3  /lib64/ld-linux-x86-64.so.2(+0xfd0a) [0x15392558cd0a]
    4  /lib64/ld-linux-x86-64.so.2(+0xfe0a) [0x15392558ce0a]
    5  /lib64/ld-linux-x86-64.so.2(+0x13def) [0x153925590def]
    6  /lib64/libc.so.6(_dl_catch_exception+0x77) [0x153924da7ab7]
    7  /lib64/ld-linux-x86-64.so.2(+0x1365e) [0x15392559065e]
    8  /lib64/libdl.so.2(+0x11ba) [0x1539245011ba]
    9  /lib64/libc.so.6(_dl_catch_exception+0x77) [0x153924da7ab7]
   10  /lib64/libc.so.6(_dl_catch_error+0x33) [0x153924da7b53]
   11  /lib64/libdl.so.2(+0x1939) [0x153924501939]
   12  /lib64/libdl.so.2(dlopen+0x4a) [0x15392450125a]
   13  /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6df05) [0x153924771f05]
   14  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x206) [0x15392474fb16]
   15  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35a) [0x15392474ea5a]
   16  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x15392475a3ce]
   17  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x252) [0x15392475a8b2]
   18  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x15) [0x15392475a915]
   19  /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x674) [0x1539252a3494]
   20  /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Init+0x72) [0x1539252d36b2]
   21  /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x2a66) [0x559af99eba66]
   22  /lib64/libc.so.6(__libc_start_main+0xf3) [0x153924c92873]
   23  /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x318e) [0x559af99ec18e]
===================
[rdma-dev-26:24261:0:24261] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7feace7a2768)
==== backtrace ====
    0  /lib64/libucs.so.0(+0x18bb0) [0x7feace135bb0]
    1  /lib64/libucs.so.0(+0x18d8a) [0x7feace135d8a]
    2  /lib64/libuct.so.0(+0x1655b) [0x7feacd46555b]
    3  /lib64/ld-linux-x86-64.so.2(+0xfd0a) [0x7feadbf56d0a]
    4  /lib64/ld-linux-x86-64.so.2(+0xfe0a) [0x7feadbf56e0a]
    5  /lib64/ld-linux-x86-64.so.2(+0x13def) [0x7feadbf5adef]
    6  /lib64/libc.so.6(_dl_catch_exception+0x77) [0x7feadb771ab7]
    7  /lib64/ld-linux-x86-64.so.2(+0x1365e) [0x7feadbf5a65e]
    8  /lib64/libdl.so.2(+0x11ba) [0x7feadaecb1ba]
    9  /lib64/libc.so.6(_dl_catch_exception+0x77) [0x7feadb771ab7]
   10  /lib64/libc.so.6(_dl_catch_error+0x33) [0x7feadb771b53]
   11  /lib64/libdl.so.2(+0x1939) [0x7feadaecb939]
   12  /lib64/libdl.so.2(dlopen+0x4a) [0x7feadaecb25a]
   13  /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6df05) [0x7feadb13bf05]
   14  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x206) [0x7feadb119b16]
   15  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35a) [0x7feadb118a5a]
   16  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x7feadb1243ce]
   17  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x252) [0x7feadb1248b2]
   18  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x15) [0x7feadb124915]
   19  /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x674) [0x7feadbc6d494]
   20  /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Init+0x72) [0x7feadbc9d6b2]
   21  /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x2a66) [0x55da14413a66]
   22  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7feadb65c873]
   23  /usr/lib64/openmpi/bin/mpitests-IMB-MPI1(+0x318e) [0x55da1441418e]
===================
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 24279 on node 172.31.45.125 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
+ [19-08-20 16:48:58] RQA_check_result -r 139 -t 'openmpi mpitests-IMB-MPI1 PingPong'


Expected results:
Command to complete successfully & test to pass.


Additional info:
Similar test on hosts with cxgb4, mlx4 roce, mlx5 ib, mlx5 roce passed. Tests with "mpitests-IMB-IO S_Read_indv", "mpitests-IMB-EXT Window", & "mpitests-osu_get_bw" args failed in a similar way. I'll try to reproduce this on RHEL-8.0 since these was dependency failure issue with previous RHEL-8.1.0 builds.

Comment 1 Afom T. Michael 2019-08-22 22:48:08 UTC
On RHEL-8.0.0, the same tests pass both on hosts with hfi1 & bnxt_re.

openmpi & mpitests-openmpi versions:

                   RHEL-8.0.0    RHEL-8.1.0
-----------------------------    ----------
openmpi            3.1.2-5       4.0.1-3
mpitests-openmpi   5.4.2-4       5.4.2-4

Comment 2 zguo 2019-08-23 10:14:19 UTC
 
I am testing openmpi for [Bug 1731749] The libfabric update to 1.8.0 in 8.1 breaks mpirun with containers, to verify no regression was caused by libfabric update. Test passed with openmpi-4.0.1-2.el8.x86_64 while failed with openmpi-4.0.1-3.el8.x86_64, both with libfabric-1.8.0-2.el8.x86_64, so it would be an openmpi regression, not libfabric.

Comment 3 Jarod Wilson 2019-08-26 02:13:53 UTC
(In reply to zguo from comment #2)
>  
> I am testing openmpi for [Bug 1731749] The libfabric update to 1.8.0 in 8.1
> breaks mpirun with containers, to verify no regression was caused by
> libfabric update. Test passed with openmpi-4.0.1-2.el8.x86_64 while failed
> with openmpi-4.0.1-3.el8.x86_64, both with libfabric-1.8.0-2.el8.x86_64, so
> it would be an openmpi regression, not libfabric.

The only difference between 4.0.1-2 and -3 is the enablement of UCX support. Afom, can you add 4.0.1-2 to the matrix in comment #1?

Comment 6 Jarod Wilson 2019-08-27 16:27:49 UTC
Afom and I have collaborated to test out some things, and with an update from ucx 1.4.0 to 1.5.2, the segmentation fault goes away, so this bug is getting reassigned to the ucx component. Not sure if a full update to 1.5.2 is warranted, but it seems we need at least some fixes backported.

Comment 8 Jonathan Toppins 2019-08-27 16:42:10 UTC
(In reply to Jarod Wilson from comment #6)
> Afom and I have collaborated to test out some things, and with an update
> from ucx 1.4.0 to 1.5.2, the segmentation fault goes away, so this bug is
> getting reassigned to the ucx component. Not sure if a full update to 1.5.2
> is warranted, but it seems we need at least some fixes backported.

When the branch for 8.2 becomes available I will post the v1.5.2 package. It is impossible to do a backport at least to solve bz1717018 in RHEL-8 because to even get the fix a 20-30 commit feature series would have to be backported too. And I am not doing that as a patch-stack.

Comment 9 Jonathan Toppins 2019-08-27 16:46:10 UTC
FYI this is not a regression as the v1.4.0 version of UCX was the only version ever released in RHEL so there is nothing to regress to.

Comment 10 Jonathan Toppins 2019-08-27 16:50:16 UTC
FYI until qa_ack is provided I won't be able to commit the update.

Comment 11 Jarod Wilson 2019-08-27 18:30:02 UTC
(In reply to Jonathan Toppins from comment #9)
> FYI this is not a regression as the v1.4.0 version of UCX was the only
> version ever released in RHEL so there is nothing to regress to.

Ah. This was originally filed against OpenMPI, where it can be considered a regression, since prior non-ucx enabled openmpi didn't crash like this. But something Afom noted: "By default, for Open MPI 4.0 and later, infiniband ports on a device are not used by default.  The intent is to use UCX for these devices. You can override this policy by setting the btl_openib_allow_ib MCA parameter to true." So it looks like we can get prior behavior with that added param, and release-note this for 8.1 if a fix isn't feasible.

Comment 12 Adrian Reber 2019-09-03 16:10:29 UTC
I also hit this on a test VM running the ring example from Open MPI.

Using 'mpirun --mca btl_openib_allow_ib true --allow-run-as-root -np 4 /tmp/ring' I still get a segfault. The VM has just one simple ethernet device.

Any way I can run my test with openmpi-4.0.1-3.el8.x86_64. For now I just downgraded to openmpi-4.0.1-2.el8.x86_64 taken directly from brew.

Comment 13 Jonathan Toppins 2019-09-04 05:32:57 UTC
(In reply to Adrian Reber from comment #12)
> I also hit this on a test VM running the ring example from Open MPI.
> 
> Using 'mpirun --mca btl_openib_allow_ib true --allow-run-as-root -np 4
> /tmp/ring' I still get a segfault. The VM has just one simple ethernet
> device.
> 
> Any way I can run my test with openmpi-4.0.1-3.el8.x86_64. For now I just
> downgraded to openmpi-4.0.1-2.el8.x86_64 taken directly from brew.

Yeah the way to fix it is to drop support for UCX in openmpi until 8.2.

Comment 15 Yossi Itigin 2019-10-15 15:40:28 UTC
Which version of UCX is used? this issue and https://bugzilla.redhat.com/show_bug.cgi?id=1717018 are fixed in v1.5.2 and above

Comment 17 Afom T. Michael 2020-01-24 19:07:21 UTC
Moving to verified since test on 4.18.0-167.el8.x86_64 with packages shown below is pass. Test was performed on the same hosts where issue was initially seen.

[root@rdma-dev-26 ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.2 Beta (Ootpa)
[root@rdma-dev-26 ~]$ uname -r
4.18.0-167.el8.x86_64
[root@rdma-dev-26 ~]$ rpm -qa | egrep 'rdma|openmpi|ucx|verbs'
libibverbs-26.0-7.el8.x86_64
mpitests-openmpi-5.4.2-4.el8.x86_64
rdma-core-devel-26.0-7.el8.x86_64
libibverbs-utils-26.0-7.el8.x86_64
librdmacm-26.0-7.el8.x86_64
ucx-1.6.1-1.el8.x86_64
openmpi-4.0.2-2.el8.x86_64
librdmacm-utils-26.0-7.el8.x86_64
rdma-core-26.0-7.el8.x86_64
[root@rdma-dev-26 ~]$ ibstatus 
Infiniband device 'bnxt_re0' port 1 status:
	default gid:	 fe80:0000:0000:0000:020a:f7ff:feea:cd90
	base lid:	 0x0
	sm lid:		 0x0
	state:		 4: ACTIVE
	phys state:	 5: LinkUp
	rate:		 100 Gb/sec (4X EDR)
	link_layer:	 Ethernet

[root@rdma-dev-26 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Fri Jan 24 13:57:28 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-167.el8.x86_64
# Version               : #1 SMP Sun Dec 15 01:24:23 UTC 2019
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 PingPong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         9.43         0.00
            1         1000         9.45         0.11
            2         1000         9.40         0.21
            4         1000         9.43         0.42
            8         1000         9.43         0.85
           16         1000         9.42         1.70
           32         1000         9.46         3.38
           64         1000         9.49         6.75
          128         1000         9.60        13.33
          256         1000        10.10        25.35
          512         1000        10.01        51.16
         1024         1000        10.23       100.10
         2048         1000        10.82       189.23
         4096         1000        12.13       337.81
         8192         1000        21.25       385.55
        16384         1000        37.76       433.87
        32768         1000        44.80       731.39
        65536          640        55.83      1173.82
       131072          320        83.72      1565.56
       262144          160       169.54      1546.17
       524288           80       289.37      1811.85
      1048576           40       526.45      1991.79
      2097152           20      1000.64      2095.82
      4194304           10      1986.67      2111.23


# All processes entering MPI_Finalize

[root@rdma-dev-26 ~]$ echo $?
0
[root@rdma-dev-26 ~]$
[root@rdma-dev-26 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-IO S_Read_indv
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018 Update 1, MPI-IO part   
#------------------------------------------------------------
# Date                  : Fri Jan 24 13:59:14 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-167.el8.x86_64
# Version               : #1 SMP Sun Dec 15 01:24:23 UTC 2019
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/openmpi/bin/mpitests-IMB-IO S_Read_indv

# Minimum io portion in bytes:   0
# Maximum io portion in bytes:   4194304
#
#
#

# List of Benchmarks to run:

# S_Read_Indv

#---------------------------------------------------
# Benchmarking S_Read_Indv 
# #processes = 1 
# ( 1 additional process waiting in MPI_Barrier)
#---------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.01         0.00
            1         1000         1.22         0.82
            2         1000         1.22         1.63
            4         1000         1.22         3.28
            8         1000         1.23         6.53
           16         1000         1.22        13.08
           32         1000         1.23        26.06
           64         1000         1.24        51.42
          128         1000         1.24       103.28
          256         1000         1.24       206.42
          512         1000         1.25       409.69
         1024         1000         1.28       800.05
         2048         1000         1.36      1508.93
         4096         1000         1.50      2736.56
         8192         1000         1.84      4447.77
        16384         1000         2.79      5871.00
        32768         1000         4.87      6724.44
        65536          640         8.62      7603.50
       131072          320        16.62      7887.74
       262144          160        33.77      7763.28
       524288           80        67.59      7757.43
      1048576           40       130.44      8038.47
      2097152           20       274.23      7647.34
      4194304           10       567.47      7391.17


# All processes entering MPI_Finalize

[root@rdma-dev-26 ~]$ echo $?
0
[root@rdma-dev-26 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-IMB-EXT Window
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018 Update 1, MPI-2 part    
#------------------------------------------------------------
# Date                  : Fri Jan 24 13:59:22 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-167.el8.x86_64
# Version               : #1 SMP Sun Dec 15 01:24:23 UTC 2019
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /usr/lib64/openmpi/bin/mpitests-IMB-EXT Window

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# Window

#----------------------------------------------------------------
# Benchmarking Window 
# #processes = 2 
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0          100       262.09       262.10       262.09
            4          100       260.52       260.53       260.52
            8          100       263.02       263.18       263.10
           16          100       263.31       263.41       263.36
           32          100       263.64       263.94       263.79
           64          100       262.22       262.52       262.37
          128          100       263.29       263.33       263.31
          256          100       261.69       261.70       261.69
          512          100       261.79       262.01       261.90
         1024          100       263.12       263.14       263.13
         2048          100       262.97       262.98       262.98
         4096          100       261.92       262.05       261.98
         8192          100       261.05       261.15       261.10
        16384          100       262.22       262.23       262.22
        32768          100       261.73       261.84       261.78
        65536          100       261.61       261.73       261.67
       131072          100       261.57       261.59       261.58
       262144          100       263.26       263.29       263.28
       524288           80       263.14       263.16       263.15
      1048576           40       260.18       260.45       260.31
      2097152           20       260.28       260.29       260.29
      4194304           10       261.04       262.29       261.67


# All processes entering MPI_Finalize

[root@rdma-dev-26 ~]$ echo $?
0
[root@rdma-dev-26 ~]$ timeout 3m /usr/lib64/openmpi/bin/mpirun --allow-run-as-root --map-by node -mca btl_openib_warn_nonexistent_if 0 -mca btl_openib_if_include bnxt_re0:1 -mca mtl '^psm2,psm,ofi' -mca btl '^openib,usnic' -hostfile /root/hfile_one_core -np 2 /usr/lib64/openmpi/bin/mpitests-osu_get_bw
# OSU MPI_Get Bandwidth Test v5.4.1
# Window creation: MPI_Win_allocate
# Synchronization: MPI_Win_flush
# Size      Bandwidth (MB/s)
1                       0.23
2                       0.94
4                       1.87
8                       3.77
16                      7.57
32                     15.03
64                     29.83
128                    58.66
256                   116.64
512                   227.98
1024                  437.24
2048                  776.50
4096                 1145.15
8192                 1415.28
16384                1829.16
32768                2147.24
65536                2250.84
131072               1830.33
262144               1786.59
524288               1782.16
1048576              1784.25
2097152              1784.00
4194304              1782.62
[root@rdma-dev-26 ~]$ echo $?
0
[root@rdma-dev-26 ~]$

Test results for sanity on rdma-dev-26:
4.18.0-167.el8.x86_64, bnxt, roce, & bnxt_re0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | load module bnxt_re
      PASS |      0 | load module bnxt_en
      PASS |      0 | ping 172.31.40.126
      PASS |      0 | ping6 bnxt_roce/fe80::20a:f7ff:feea:cd90
      PASS |      0 | ibstatus reported expected HCA rate
      PASS |      0 | vlan bnxt_roce.81 create/delete
      PASS |      0 | /usr/sbin/ibstat
      PASS |      0 | /usr/sbin/ibstatus
      PASS |      0 | systemctl start srp_daemon.service
      SKIP |    777 | ibsrpdm
      PASS |      0 | systemctl stop srp_daemon
      PASS |      0 | client pings server
      PASS |      0 | openmpi mpitests-IMB-MPI1 PingPong
      PASS |      0 | openmpi mpitests-IMB-IO S_Read_indv
      PASS |      0 | openmpi mpitests-IMB-EXT Window
      PASS |      0 | openmpi mpitests-osu_get_bw
      PASS |      0 | ip multicast addr
      PASS |      0 | rping
      PASS |      0 | rcopy
      PASS |      0 | ib_read_bw
      PASS |      0 | ib_send_bw
      PASS |      0 | ib_write_bw
      PASS |      0 | iser login
      PASS |      0 | mount /dev/sdb /iser
      PASS |      0 | iser write 1K
      PASS |      0 | iser write 1M
      PASS |      0 | iser write 1G
      PASS |      0 | nfsordma mount
      PASS |      0 | nfsordma write 1K
      PASS |      0 | nfsordma write 1M
      PASS |      0 | nfsordma write 1G

Test results for mpi/openmpi on rdma-dev-26:
4.18.0-167.el8.x86_64, bnxt, roce, & bnxt_re0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | openmpi IMB-MPI1 PingPong mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 PingPing mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Sendrecv mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Exchange mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Bcast mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Allgather mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Allgatherv mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Gather mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Gatherv mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Scatter mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Scatterv mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Alltoall mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Alltoallv mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Reduce mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Reduce_scatter mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Allreduce mpirun one_core
      PASS |      0 | openmpi IMB-MPI1 Barrier mpirun one_core
      PASS |      0 | openmpi IMB-IO S_Write_indv mpirun one_core
      PASS |      0 | openmpi IMB-IO S_Read_indv mpirun one_core
      PASS |      0 | openmpi IMB-IO S_Write_expl mpirun one_core
      PASS |      0 | openmpi IMB-IO S_Read_expl mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Write_indv mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Read_indv mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Write_expl mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Read_expl mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Write_shared mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Read_shared mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Write_priv mpirun one_core
      PASS |      0 | openmpi IMB-IO P_Read_priv mpirun one_core
      PASS |      0 | openmpi IMB-IO C_Write_indv mpirun one_core
      PASS |      0 | openmpi IMB-IO C_Read_indv mpirun one_core
      PASS |      0 | openmpi IMB-IO C_Write_expl mpirun one_core
      PASS |      0 | openmpi IMB-IO C_Read_expl mpirun one_core
      PASS |      0 | openmpi IMB-IO C_Write_shared mpirun one_core
      PASS |      0 | openmpi IMB-IO C_Read_shared mpirun one_core
      PASS |      0 | openmpi IMB-EXT Window mpirun one_core
      PASS |      0 | openmpi IMB-EXT Unidir_Put mpirun one_core
      PASS |      0 | openmpi IMB-EXT Unidir_Get mpirun one_core
      PASS |      0 | openmpi IMB-EXT Bidir_Get mpirun one_core
      PASS |      0 | openmpi IMB-EXT Bidir_Put mpirun one_core
      PASS |      0 | openmpi IMB-EXT Accumulate mpirun one_core
      PASS |      0 | openmpi IMB-NBC Ibcast mpirun one_core
      PASS |      0 | openmpi IMB-NBC Iallgather mpirun one_core
      PASS |      0 | openmpi IMB-NBC Iallgatherv mpirun one_core
      PASS |      0 | openmpi IMB-NBC Igather mpirun one_core
      PASS |      0 | openmpi IMB-NBC Igatherv mpirun one_core
      PASS |      0 | openmpi IMB-NBC Iscatter mpirun one_core
      PASS |      0 | openmpi IMB-NBC Iscatterv mpirun one_core
      PASS |      0 | openmpi IMB-NBC Ialltoall mpirun one_core
      PASS |      0 | openmpi IMB-NBC Ialltoallv mpirun one_core
      PASS |      0 | openmpi IMB-NBC Ireduce mpirun one_core
      PASS |      0 | openmpi IMB-NBC Ireduce_scatter mpirun one_core
      PASS |      0 | openmpi IMB-NBC Iallreduce mpirun one_core
      PASS |      0 | openmpi IMB-NBC Ibarrier mpirun one_core
      PASS |      0 | openmpi IMB-RMA Unidir_put mpirun one_core
      PASS |      0 | openmpi IMB-RMA Unidir_get mpirun one_core
      PASS |      0 | openmpi IMB-RMA Bidir_put mpirun one_core
      PASS |      0 | openmpi IMB-RMA Bidir_get mpirun one_core
      PASS |      0 | openmpi IMB-RMA One_put_all mpirun one_core
      PASS |      0 | openmpi IMB-RMA One_get_all mpirun one_core
      PASS |      0 | openmpi IMB-RMA All_put_all mpirun one_core
      PASS |      0 | openmpi IMB-RMA All_get_all mpirun one_core
      PASS |      0 | openmpi IMB-RMA Put_local mpirun one_core
      PASS |      0 | openmpi IMB-RMA Put_all_local mpirun one_core
      PASS |      0 | openmpi IMB-RMA Exchange_put mpirun one_core
      PASS |      0 | openmpi IMB-RMA Exchange_get mpirun one_core
      PASS |      0 | openmpi IMB-RMA Accumulate mpirun one_core
      PASS |      0 | openmpi IMB-RMA Get_accumulate mpirun one_core
      PASS |      0 | openmpi IMB-RMA Fetch_and_op mpirun one_core
      PASS |      0 | openmpi IMB-RMA Compare_and_swap mpirun one_core
      PASS |      0 | openmpi IMB-RMA Get_local mpirun one_core
      PASS |      0 | openmpi IMB-RMA Get_all_local mpirun one_core
      PASS |      0 | openmpi OSU acc_latency mpirun one_core
      PASS |      0 | openmpi OSU allgather mpirun one_core
      PASS |      0 | openmpi OSU allgatherv mpirun one_core
      PASS |      0 | openmpi OSU allreduce mpirun one_core
      PASS |      0 | openmpi OSU alltoall mpirun one_core
      PASS |      0 | openmpi OSU alltoallv mpirun one_core
      PASS |      0 | openmpi OSU barrier mpirun one_core
      PASS |      0 | openmpi OSU bcast mpirun one_core
      PASS |      0 | openmpi OSU bibw mpirun one_core
      PASS |      0 | openmpi OSU bw mpirun one_core
      PASS |      0 | openmpi OSU cas_latency mpirun one_core
      PASS |      0 | openmpi OSU fop_latency mpirun one_core
      PASS |      0 | openmpi OSU gather mpirun one_core
      PASS |      0 | openmpi OSU gatherv mpirun one_core
      PASS |      0 | openmpi OSU get_acc_latency mpirun one_core
      PASS |      0 | openmpi OSU get_bw mpirun one_core
      PASS |      0 | openmpi OSU get_latency mpirun one_core
      PASS |      0 | openmpi OSU hello mpirun one_core
      PASS |      0 | openmpi OSU iallgather mpirun one_core
      PASS |      0 | openmpi OSU iallgatherv mpirun one_core
      PASS |      0 | openmpi OSU ialltoall mpirun one_core
      PASS |      0 | openmpi OSU ialltoallv mpirun one_core
      PASS |      0 | openmpi OSU ialltoallw mpirun one_core
      PASS |      0 | openmpi OSU ibarrier mpirun one_core
      PASS |      0 | openmpi OSU ibcast mpirun one_core
      PASS |      0 | openmpi OSU igather mpirun one_core
      PASS |      0 | openmpi OSU igatherv mpirun one_core
      PASS |      0 | openmpi OSU init mpirun one_core
      PASS |      0 | openmpi OSU iscatter mpirun one_core
      PASS |      0 | openmpi OSU iscatterv mpirun one_core
      PASS |      0 | openmpi OSU latency mpirun one_core
      PASS |      0 | openmpi OSU mbw_mr mpirun one_core
      PASS |      0 | openmpi OSU multi_lat mpirun one_core
      PASS |      0 | openmpi OSU put_bibw mpirun one_core
      PASS |      0 | openmpi OSU put_bw mpirun one_core
      PASS |      0 | openmpi OSU put_latency mpirun one_core
      PASS |      0 | openmpi OSU reduce mpirun one_core
      PASS |      0 | openmpi OSU reduce_scatter mpirun one_core
      PASS |      0 | openmpi OSU scatter mpirun one_core
      PASS |      0 | openmpi OSU scatterv mpirun one_core
      PASS |      0 | NON-ROOT IMB-MPI1 PingPong

Checking for failures and known issues:
  no test failures

Comment 19 errata-xmlrpc 2020-04-28 15:34:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1590


Note You need to log in before you can comment on or make changes to this bug.