RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1814296 - [RHEL8.2] all mvapich2 tests failed with "Floating point exception" - esnap3 runs
Summary: [RHEL8.2] all mvapich2 tests failed with "Floating point exception" - esnap3 ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: mvapich2
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.3
Assignee: Honggang LI
QA Contact: Brian Chae
URL:
Whiteboard:
Depends On:
Blocks: 1802014
TreeView+ depends on / blocked
 
Reported: 2020-03-17 15:11 UTC by Brian Chae
Modified: 2020-11-04 01:38 UTC (History)
3 users (show)

Fixed In Version: mvapich2-2.3.3-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-04 01:37:28 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The RDMA tier2 test result log for "mvapich2" test suite with all failures (473.11 KB, text/plain)
2020-03-17 15:14 UTC, Brian Chae
no flags Details
Test log with older rhel82 image with successful "mvapich2" tests (564.07 KB, text/plain)
2020-03-17 15:19 UTC, Brian Chae
no flags Details
test log for all "mpirun" test cases passed; while all "mpirun_rsh" test cases failed (827.94 KB, text/plain)
2020-03-23 20:16 UTC, Brian Chae
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:4456 0 None None None 2020-11-04 01:37:48 UTC

Description Brian Chae 2020-03-17 15:11:07 UTC
Description of problem:
All of mvapich2 testcases failed with the following error messages:

+ [20-03-11 01:12:46] for app in $(cat imb_mpi.txt)
+ [20-03-11 01:12:46] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 mpitests-IMB-MPI1 PingPing -time 1.5
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][error_sighandler] Caught error: Floating point exception (signal 8)
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][error_sighandler] Caught error: Floating point exception (signal 8)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 47541 RUNNING AT 172.31.0.7
=   EXIT CODE: 136
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0.bos.redhat.com] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:911): assert (!closed) failed
[proxy:0:0.bos.redhat.com] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0.bos.redhat.com] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception (signal 8)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions



Version-Release number of selected component (if applicable):


Image: Using RHEL-8.2.0-20200310.0 BaseOS x86_64

+ [20-03-11 01:12:35] tr -d ' '
DISTRO=RHEL-8.2.0-20200310.0
DISTRO=RHEL-8.2.0-20200310.0
+ [20-03-11 01:12:35] cat /etc/redhat-release
Red Hat Enterprise Linux release 8.2 Beta (Ootpa)
+ [20-03-11 01:12:35] uname -a
Linux rdma-qe-07.lab.bos.redhat.com 4.18.0-187.el8.x86_64 #1 SMP Sat Mar 7 03:42:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+ [20-03-11 01:12:35] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-187.el8.x86_64 root=UUID=49e11e23-084d-4cca-bde2-06ba08806a11 ro intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=UUID=a134f625-ec6b-4538-b7f2-549c3cf7beb0 console=ttyS1,115200n81
+ [20-03-11 01:12:35] rpm -q rdma-core linux-firmware
rdma-core-26.0-8.el8.x86_64
linux-firmware-20191202-97.gite8a0f4c9.el8.noarch

--------------------

Package environment-modules-4.1.4-4.el8.x86_64 is already installed.
Dependencies resolved.
================================================================================
 Package               Arch       Version            Repository            Size
================================================================================
Installing:
 mpitests-mvapich2     x86_64     5.4.2-4.el8        beaker-AppStream     342 k
 mvapich2              x86_64     2.3.2-2.el8        beaker-AppStream     3.1 M

Transaction Summary
================================================================================
Install  2 Packages

Total download size: 3.5 M
Installed size: 15 M
Downloading Packages:
(1/2): mpitests-mvapich2-5.4.2-4.el8.x86_64.rpm  36 MB/s | 342 kB     00:00    
(2/2): mvapich2-2.3.2-2.el8.x86_64.rpm           69 MB/s | 3.1 MB     00:00    
--------------------------------------------------------------------------------
Total                                            75 MB/s | 3.5 MB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1 
  Installing       : mvapich2-2.3.2-2.el8.x86_64                            1/2 
  Installing       : mpitests-mvapich2-5.4.2-4.el8.x86_64                   2/2 
  Running scriptlet: mpitests-mvapich2-5.4.2-4.el8.x86_64                   2/2 
  Verifying        : mpitests-mvapich2-5.4.2-4.el8.x86_64                   1/2 
  Verifying        : mvapich2-2.3.2-2.el8.x86_64                            2/2 
Installed products updated.

Installed:
  mpitests-mvapich2-5.4.2-4.el8.x86_64        mvapich2-2.3.2-2.el8.x86_64       

Complete!

How reproducible:

All the time


Steps to Reproduce:
1.This was from tier2 RDMA test suite for "mvapich2" for eSNAP3 runs
2.Refer to the attached test log
3.

Actual results:

Test results for mpi/mvapich2 on rdma-qe-07:
4.18.0-187.el8.x86_64, mlx5, ib0, & mlx5_0
    Result | Status | Test
  ---------+--------+------------------------------------
      FAIL |    136 | mvapich2 IMB-MPI1 PingPong mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 PingPing mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Sendrecv mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Exchange mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Bcast mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Allgather mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Allgatherv mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Gather mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Gatherv mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Scatter mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Scatterv mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Alltoall mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Alltoallv mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Reduce mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Reduce_scatter mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Allreduce mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Barrier mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO S_Write_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO S_Read_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO S_Write_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO S_Read_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Write_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Read_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Write_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Read_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Write_shared mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Read_shared mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Write_priv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Read_priv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO C_Write_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO C_Read_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO C_Write_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO C_Read_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO C_Write_shared mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO C_Read_shared mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Window mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Unidir_Put mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Unidir_Get mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Bidir_Get mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Bidir_Put mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Accumulate mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ibcast mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Iallgather mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Iallgatherv mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Igather mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Igatherv mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Iscatter mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Iscatterv mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ialltoall mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ialltoallv mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ireduce mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ireduce_scatter mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Iallreduce mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ibarrier mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Unidir_put mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Unidir_get mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Bidir_put mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Bidir_get mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA One_put_all mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA One_get_all mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA All_put_all mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA All_get_all mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Put_local mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Put_all_local mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Exchange_put mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Exchange_get mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Accumulate mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Get_accumulate mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Fetch_and_op mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Compare_and_swap mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Get_local mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Get_all_local mpirun one_core
      FAIL |    999 | mpirun_rsh known issue
      FAIL |    136 | mvapich2 OSU acc_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU allgather mpirun one_core
      FAIL |    136 | mvapich2 OSU allgatherv mpirun one_core
      FAIL |    136 | mvapich2 OSU allreduce mpirun one_core
      FAIL |    136 | mvapich2 OSU alltoall mpirun one_core
      FAIL |    136 | mvapich2 OSU alltoallv mpirun one_core
      FAIL |    136 | mvapich2 OSU barrier mpirun one_core
      FAIL |    136 | mvapich2 OSU bcast mpirun one_core
      FAIL |    136 | mvapich2 OSU bibw mpirun one_core
      FAIL |    136 | mvapich2 OSU bw mpirun one_core
      FAIL |    136 | mvapich2 OSU cas_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU fop_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU gather mpirun one_core
      FAIL |    136 | mvapich2 OSU gatherv mpirun one_core
      FAIL |    136 | mvapich2 OSU get_acc_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU get_bw mpirun one_core
      FAIL |    136 | mvapich2 OSU get_latency mpirun one_core
      FAIL |    255 | mvapich2 OSU hello mpirun one_core
      FAIL |    136 | mvapich2 OSU iallgather mpirun one_core
      FAIL |    136 | mvapich2 OSU iallgatherv mpirun one_core
      FAIL |    136 | mvapich2 OSU ialltoall mpirun one_core
      FAIL |    136 | mvapich2 OSU ialltoallv mpirun one_core
      FAIL |    136 | mvapich2 OSU ialltoallw mpirun one_core
      FAIL |    136 | mvapich2 OSU ibarrier mpirun one_core
      FAIL |    136 | mvapich2 OSU ibcast mpirun one_core
      FAIL |    136 | mvapich2 OSU igather mpirun one_core
      FAIL |    136 | mvapich2 OSU igatherv mpirun one_core
      FAIL |    136 | mvapich2 OSU init mpirun one_core
      FAIL |    136 | mvapich2 OSU iscatter mpirun one_core
      FAIL |    136 | mvapich2 OSU iscatterv mpirun one_core
      FAIL |    136 | mvapich2 OSU latency mpirun one_core
      FAIL |    136 | mvapich2 OSU mbw_mr mpirun one_core
      FAIL |    136 | mvapich2 OSU multi_lat mpirun one_core
      FAIL |    136 | mvapich2 OSU put_bibw mpirun one_core
      FAIL |    136 | mvapich2 OSU put_bw mpirun one_core
      FAIL |    136 | mvapich2 OSU put_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU reduce mpirun one_core
      FAIL |    136 | mvapich2 OSU reduce_scatter mpirun one_core
      FAIL |    136 | mvapich2 OSU scatter mpirun one_core
      FAIL |    136 | mvapich2 OSU scatterv mpirun one_core
      FAIL |    999 | mpirun_rsh known issue


Expected results:

Test results for mpi/mvapich2 on rdma-qe-07:
4.18.0-151.el8.x86_64, mlx5, ib0, & mlx5_0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | mvapich2 IMB-MPI1 PingPong mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 PingPing mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Sendrecv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Exchange mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Bcast mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allgather mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allgatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Gather mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Gatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Scatter mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Scatterv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Alltoall mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Alltoallv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Reduce mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Reduce_scatter mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allreduce mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Barrier mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Write_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Read_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Write_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Read_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_priv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_priv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Window mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Unidir_Put mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Unidir_Get mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Bidir_Get mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Bidir_Put mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Accumulate mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ibcast mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iallgather mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iallgatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Igather mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Igatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iscatter mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iscatterv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ialltoall mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ialltoallv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ireduce mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ireduce_scatter mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iallreduce mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ibarrier mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Unidir_put mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Unidir_get mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Bidir_put mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Bidir_get mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA One_put_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA One_get_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA All_put_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA All_get_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Put_local mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Put_all_local mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Exchange_put mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Exchange_get mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Accumulate mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Get_accumulate mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Fetch_and_op mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Compare_and_swap mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Get_local mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Get_all_local mpirun one_core
      FAIL |    999 | mpirun_rsh known issue
      PASS |      0 | mvapich2 OSU acc_latency mpirun one_core
      PASS |      0 | mvapich2 OSU allgather mpirun one_core
      PASS |      0 | mvapich2 OSU allgatherv mpirun one_core
      PASS |      0 | mvapich2 OSU allreduce mpirun one_core
      PASS |      0 | mvapich2 OSU alltoall mpirun one_core
      PASS |      0 | mvapich2 OSU alltoallv mpirun one_core
      PASS |      0 | mvapich2 OSU barrier mpirun one_core
      PASS |      0 | mvapich2 OSU bcast mpirun one_core
      PASS |      0 | mvapich2 OSU bibw mpirun one_core
      PASS |      0 | mvapich2 OSU bw mpirun one_core
      PASS |      0 | mvapich2 OSU cas_latency mpirun one_core
      PASS |      0 | mvapich2 OSU fop_latency mpirun one_core
      PASS |      0 | mvapich2 OSU gather mpirun one_core
      PASS |      0 | mvapich2 OSU gatherv mpirun one_core
      PASS |      0 | mvapich2 OSU get_acc_latency mpirun one_core
      PASS |      0 | mvapich2 OSU get_bw mpirun one_core
      PASS |      0 | mvapich2 OSU get_latency mpirun one_core
      PASS |      0 | mvapich2 OSU hello mpirun one_core
      PASS |      0 | mvapich2 OSU iallgather mpirun one_core
      PASS |      0 | mvapich2 OSU iallgatherv mpirun one_core
      PASS |      0 | mvapich2 OSU ialltoall mpirun one_core
      PASS |      0 | mvapich2 OSU ialltoallv mpirun one_core
      PASS |      0 | mvapich2 OSU ialltoallw mpirun one_core
      PASS |      0 | mvapich2 OSU ibarrier mpirun one_core
      PASS |      0 | mvapich2 OSU ibcast mpirun one_core
      PASS |      0 | mvapich2 OSU igather mpirun one_core
      PASS |      0 | mvapich2 OSU igatherv mpirun one_core
      PASS |      0 | mvapich2 OSU init mpirun one_core
      PASS |      0 | mvapich2 OSU iscatter mpirun one_core
      PASS |      0 | mvapich2 OSU iscatterv mpirun one_core
      PASS |      0 | mvapich2 OSU latency mpirun one_core
      PASS |      0 | mvapich2 OSU mbw_mr mpirun one_core
      PASS |      0 | mvapich2 OSU multi_lat mpirun one_core
      PASS |      0 | mvapich2 OSU put_bibw mpirun one_core
      PASS |      0 | mvapich2 OSU put_bw mpirun one_core
      PASS |      0 | mvapich2 OSU put_latency mpirun one_core
      PASS |      0 | mvapich2 OSU reduce mpirun one_core
      PASS |      0 | mvapich2 OSU reduce_scatter mpirun one_core
      PASS |      0 | mvapich2 OSU scatter mpirun one_core
      PASS |      0 | mvapich2 OSU scatterv mpirun one_core
      FAIL |    999 | mpirun_rsh known issue

Additional info:

When the same test suite was run with 

Using RHEL-8.2.0-20191120.0 BaseOS x86_64

All of the above "mvapich2" test cases passed, as shown above.
refer to the attachment log.

Comment 1 Brian Chae 2020-03-17 15:14:31 UTC
Created attachment 1670819 [details]
The RDMA tier2 test result log for "mvapich2" test suite with all failures

This log contains all of the test cases with failures and the error messages.

Comment 2 Brian Chae 2020-03-17 15:19:00 UTC
Created attachment 1670820 [details]
Test log with older rhel82 image with successful "mvapich2" tests

This log shows all of the "mvapich2" tests being successful with older RHEL8.2 image (Using RHEL-8.2.0-20191120.0 BaseOS x86_64).

Comment 3 Honggang LI 2020-03-18 09:35:30 UTC
There is something wrong with hwloc CPU affinity. Set parameter 'MV2_ENABLE_AFFINITY' to 0 is the workaround.


[root@rdma-qe-07 ~]$ timeout --preserve-status --kill-after=5m 3m mpirun  -genv MV2_DEBUG_SHOW_BACKTRACE 1 -genv MV2_ENABLE_AFFINITY 0 -hostfile /root/hfile_one_core -np 2  mpitests-osu_bw
# OSU MPI Bandwidth Test v5.4.1
# Size      Bandwidth (MB/s)
1                       2.85
2                       5.75
4                      11.45
8                      22.80
16                     45.14
32                     88.60
64                    201.45
128                   389.61
256                   827.45
512                  1419.97
1024                 2022.50
2048                 2431.36
4096                 2464.72
8192                 3322.39
16384                6039.20
32768                6253.89
65536                6384.04
131072               6455.55
262144               6493.53
524288               6511.77
1048576              6520.92
2097152              6525.39
4194304              6527.89



[root@rdma-qe-07 ~]$ 
[root@rdma-qe-07 ~]$ timeout --preserve-status --kill-after=5m 3m mpirun  -genv MV2_DEBUG_SHOW_BACKTRACE 1 -genv MV2_ENABLE_AFFINITY 1 -hostfile /root/hfile_one_core -np 2  mpitests-osu_bw
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][error_sighandler] Caught error: Floating point exception (signal 8)
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   0: /usr/lib64/mvapich2/lib/libmpi.so.12(print_backtrace+0x38) [0x150f5889edd8]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   1: /usr/lib64/mvapich2/lib/libmpi.so.12(error_sighandler+0x77) [0x150f5889ef37]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   2: /lib64/libpthread.so.0(+0x12dd0) [0x150f58e39dd0]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   3: /usr/lib64/mvapich2/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x34c) [0x150f588f724c]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   4: /usr/lib64/mvapich2/lib/libmpi.so.12(MPID_Init+0x2c5) [0x150f58837465]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   5: /usr/lib64/mvapich2/lib/libmpi.so.12(MPIR_Init_thread+0x329) [0x150f58797b79]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   6: /usr/lib64/mvapich2/lib/libmpi.so.12(MPI_Init+0xaa) [0x150f5879754a]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   7: mpitests-osu_bw(+0x19e6) [0x5643c62fe9e6]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   8: /lib64/libc.so.6(__libc_start_main+0xf3) [0x150f576c36a3]
[rdma-qe-07.lab.bos.redhat.com:mpi_rank_1][print_backtrace]   9: mpitests-osu_bw(+0x201e) [0x5643c62ff01e]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][error_sighandler] Caught error: Floating point exception (signal 8)
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   0: /usr/lib64/mvapich2/lib/libmpi.so.12(print_backtrace+0x38) [0x14ced9eeddd8]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   1: /usr/lib64/mvapich2/lib/libmpi.so.12(error_sighandler+0x77) [0x14ced9eedf37]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   2: /lib64/libpthread.so.0(+0x12dd0) [0x14ceda488dd0]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   3: /usr/lib64/mvapich2/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x34c) [0x14ced9f4624c]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   4: /usr/lib64/mvapich2/lib/libmpi.so.12(MPID_Init+0x2c5) [0x14ced9e86465]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   5: /usr/lib64/mvapich2/lib/libmpi.so.12(MPIR_Init_thread+0x329) [0x14ced9de6b79]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   6: /usr/lib64/mvapich2/lib/libmpi.so.12(MPI_Init+0xaa) [0x14ced9de654a]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   7: mpitests-osu_bw(+0x19e6) [0x56490a9c29e6]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   8: /lib64/libc.so.6(__libc_start_main+0xf3) [0x14ced8d126a3]
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][print_backtrace]   9: mpitests-osu_bw(+0x201e) [0x56490a9c301e]

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 33296 RUNNING AT 172.31.0.7
=   EXIT CODE: 136
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0.bos.redhat.com] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:911): assert (!closed) failed
[proxy:0:0.bos.redhat.com] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0.bos.redhat.com] main (pm/pmiserv/pmip.c:202): demux engine error waiting for event
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception (signal 8)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
[root@rdma-qe-07 ~]$

Comment 4 Honggang LI 2020-03-18 09:37:52 UTC
Hi, Brian
 Is it Mellanox mlx5 specific issue? Can we reproduce it with other HCA? For example, mlx4, opa ... ?

Comment 5 Brian Chae 2020-03-23 14:48:27 UTC
Honggang, Afom tested the "mvapich2" tests on another MLX5 hosts with the same image, and it worked.
Per Afom,

================================================

After seeing your email, I ran the test
manually on rdma-perf-02/03 (mlx5_ib0 & it passed as shown below. Maybe
check the correct module is load by running (this should have been done
automatically from /root/.bashrc but it doesn't hurt to check). If
failure continues, rhts-reboot sometimes helps. If it is consistent on
rdma-qe-06/07, it might be "mlx5 MT27600 CIB ib0/ib1 56" specific since
rmda-perf-02/03 have "mlx5 MT27800". Just thinking out loud :-)

   [root@rdma-perf-03 mvapich2]$ module list
   Currently Loaded Modulefiles:
    1) /etc/modulefiles/mpi/mvapich2-x86_64 
   [root@rdma-perf-03 mvapich2]$

Run results:
   Test results for mpi/mvapich2 on rdma-perf-03:
   4.18.0-187.el8.x86_64, mlx5, ib0, & mlx5_0
       Result | Status | Test

================================================

I will try to test on MLX4 hosts with "mvapich2" and will let you know.

Comment 6 Brian Chae 2020-03-23 15:18:24 UTC
Honggang, here is one with the same result for "mvapich2" on MLX4.


Test results for mpi/mvapich2 on rdma-dev-01:
4.18.0-187.el8.x86_64, mlx4, ib0, & mlx4_0
    Result | Status | Test
  ---------+--------+------------------------------------
      FAIL |    255 | mvapich2 IMB-MPI1 PingPong mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 PingPing mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Sendrecv mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Exchange mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Bcast mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Allgather mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Allgatherv mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Gather mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Gatherv mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Scatter mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Scatterv mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Alltoall mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Alltoallv mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Reduce mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Reduce_scatter mpirun one_core
      FAIL |    255 | mvapich2 IMB-MPI1 Allreduce mpirun one_core
      FAIL |    136 | mvapich2 IMB-MPI1 Barrier mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO S_Write_indv mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO S_Read_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO S_Write_expl mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO S_Read_expl mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO P_Write_indv mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Read_indv mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO P_Write_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Read_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Write_shared mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO P_Read_shared mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO P_Write_priv mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO P_Read_priv mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO C_Write_indv mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO C_Read_indv mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO C_Write_expl mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO C_Read_expl mpirun one_core
      FAIL |    136 | mvapich2 IMB-IO C_Write_shared mpirun one_core
      FAIL |    255 | mvapich2 IMB-IO C_Read_shared mpirun one_core
      FAIL |    255 | mvapich2 IMB-EXT Window mpirun one_core
      FAIL |    255 | mvapich2 IMB-EXT Unidir_Put mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Unidir_Get mpirun one_core
      FAIL |    136 | mvapich2 IMB-EXT Bidir_Get mpirun one_core
      FAIL |    255 | mvapich2 IMB-EXT Bidir_Put mpirun one_core
      FAIL |    255 | mvapich2 IMB-EXT Accumulate mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ibcast mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Iallgather mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Iallgatherv mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Igather mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Igatherv mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Iscatter mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Iscatterv mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Ialltoall mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Ialltoallv mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ireduce mpirun one_core
      FAIL |    255 | mvapich2 IMB-NBC Ireduce_scatter mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Iallreduce mpirun one_core
      FAIL |    136 | mvapich2 IMB-NBC Ibarrier mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Unidir_put mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Unidir_get mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Bidir_put mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Bidir_get mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA One_put_all mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA One_get_all mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA All_put_all mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA All_get_all mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Put_local mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Put_all_local mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Exchange_put mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Exchange_get mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Accumulate mpirun one_core
      FAIL |    136 | mvapich2 IMB-RMA Get_accumulate mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Fetch_and_op mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Compare_and_swap mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Get_local mpirun one_core
      FAIL |    255 | mvapich2 IMB-RMA Get_all_local mpirun one_core
      FAIL |      1 | mvapich2 IMB-MPI1 PingPong mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 PingPing mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Sendrecv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Exchange mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Bcast mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Allgather mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Allgatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Gather mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Gatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Scatter mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Scatterv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Alltoall mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Alltoallv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Reduce mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Reduce_scatter mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Allreduce mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-MPI1 Barrier mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO S_Write_indv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO S_Read_indv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO S_Write_expl mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO S_Read_expl mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Write_indv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Read_indv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Write_expl mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Read_expl mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Write_shared mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Read_shared mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Write_priv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO P_Read_priv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO C_Write_indv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO C_Read_indv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO C_Write_expl mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO C_Read_expl mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO C_Write_shared mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-IO C_Read_shared mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-EXT Window mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-EXT Unidir_Put mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-EXT Unidir_Get mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-EXT Bidir_Get mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-EXT Bidir_Put mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-EXT Accumulate mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Ibcast mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Iallgather mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Iallgatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Igather mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Igatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Iscatter mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Iscatterv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Ialltoall mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Ialltoallv mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Ireduce mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Ireduce_scatter mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Iallreduce mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-NBC Ibarrier mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Unidir_put mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Unidir_get mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Bidir_put mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Bidir_get mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA One_put_all mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA One_get_all mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA All_put_all mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA All_get_all mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Put_local mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Put_all_local mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Exchange_put mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Exchange_get mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Accumulate mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Get_accumulate mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Fetch_and_op mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Compare_and_swap mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Get_local mpirun_rsh one_core
      FAIL |      1 | mvapich2 IMB-RMA Get_all_local mpirun_rsh one_core
      FAIL |    136 | mvapich2 OSU acc_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU allgather mpirun one_core
      FAIL |    136 | mvapich2 OSU allgatherv mpirun one_core
      FAIL |    136 | mvapich2 OSU allreduce mpirun one_core
      FAIL |    255 | mvapich2 OSU alltoall mpirun one_core
      FAIL |    255 | mvapich2 OSU alltoallv mpirun one_core
      FAIL |    136 | mvapich2 OSU barrier mpirun one_core
      FAIL |    136 | mvapich2 OSU bcast mpirun one_core
      FAIL |    255 | mvapich2 OSU bibw mpirun one_core
      FAIL |    255 | mvapich2 OSU bw mpirun one_core
      FAIL |    136 | mvapich2 OSU cas_latency mpirun one_core
      FAIL |    255 | mvapich2 OSU fop_latency mpirun one_core
      FAIL |    255 | mvapich2 OSU gather mpirun one_core
      FAIL |    136 | mvapich2 OSU gatherv mpirun one_core
      FAIL |    136 | mvapich2 OSU get_acc_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU get_bw mpirun one_core
      FAIL |    255 | mvapich2 OSU get_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU hello mpirun one_core
      FAIL |    136 | mvapich2 OSU iallgather mpirun one_core
      FAIL |    255 | mvapich2 OSU iallgatherv mpirun one_core
      FAIL |    255 | mvapich2 OSU ialltoall mpirun one_core
      FAIL |    255 | mvapich2 OSU ialltoallv mpirun one_core
      FAIL |    255 | mvapich2 OSU ialltoallw mpirun one_core
      FAIL |    255 | mvapich2 OSU ibarrier mpirun one_core
      FAIL |    136 | mvapich2 OSU ibcast mpirun one_core
      FAIL |    255 | mvapich2 OSU igather mpirun one_core
      FAIL |    136 | mvapich2 OSU igatherv mpirun one_core
      FAIL |    136 | mvapich2 OSU init mpirun one_core
      FAIL |    255 | mvapich2 OSU iscatter mpirun one_core
      FAIL |    255 | mvapich2 OSU iscatterv mpirun one_core
      FAIL |    255 | mvapich2 OSU latency mpirun one_core
      FAIL |    136 | mvapich2 OSU mbw_mr mpirun one_core
      FAIL |    136 | mvapich2 OSU multi_lat mpirun one_core
      FAIL |    136 | mvapich2 OSU put_bibw mpirun one_core
      FAIL |    136 | mvapich2 OSU put_bw mpirun one_core
      FAIL |    136 | mvapich2 OSU put_latency mpirun one_core
      FAIL |    136 | mvapich2 OSU reduce mpirun one_core
      FAIL |    255 | mvapich2 OSU reduce_scatter mpirun one_core
      FAIL |    255 | mvapich2 OSU scatter mpirun one_core
      FAIL |    136 | mvapich2 OSU scatterv mpirun one_core
      FAIL |      1 | mvapich2 OSU acc_latency mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU allgather mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU allgatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU allreduce mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU alltoall mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU alltoallv mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU barrier mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU bcast mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU bibw mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU bw mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU cas_latency mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU fop_latency mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU gather mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU gatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU get_acc_latency mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU get_bw mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU get_latency mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU hello mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU iallgather mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU iallgatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU ialltoall mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU ialltoallv mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU ialltoallw mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU ibarrier mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU ibcast mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU igather mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU igatherv mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU init mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU iscatter mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU iscatterv mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU latency mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU mbw_mr mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU multi_lat mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU put_bibw mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU put_bw mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU put_latency mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU reduce mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU reduce_scatter mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU scatter mpirun_rsh one_core
      FAIL |      1 | mvapich2 OSU scatterv mpirun_rsh one_core


the test was run on rdmda-dev-00 (server) and rdma-dev-01 hosts. [ beaker job: J:4144312 ]

Comment 7 Brian Chae 2020-03-23 19:10:36 UTC
Honggang, with test cases with "mpirun" which accepts the parameter "-genv MV2_ENABLE_AFFINITY 0", yes, the test went through successfully.

+ [20-03-23 13:58:52] for app in $(cat imb_rma.txt)
+ [20-03-23 13:58:52] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -genv MV2_DEBUG_SHOW_BACKTRACE 1 -genv MV2_ENABLE_AFFINITY 0 -np 2 mpitests-IMB-RMA Get_local -time 1.5
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018 Update 1, MPI-RMA part
#------------------------------------------------------------
# Date                  : Mon Mar 23 13:58:52 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-187.el8.x86_64
# Version               : #1 SMP Sat Mar 7 03:42:33 UTC 2020
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# mpitests-IMB-RMA Get_local -time 1.5

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Get_local

#---------------------------------------------------
# Benchmarking Get_local
# #processes = 2
#---------------------------------------------------
#
#    MODE: NON-AGGREGATE
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0          100         0.15         0.00
            1          100         0.30         3.38
            2          100         0.29         6.82
            4          100         0.30        13.53
            8          100         0.29        27.50
           16          100         0.29        54.56
           32          100         0.30       108.24
           64          100         0.42       153.39
          128          100         0.29       440.06
          256          100         0.29       880.12
          512          100         0.32      1614.65
         1024          100         0.30      3408.70
         2048          100         0.30      6927.37
         4096          100         0.30     13743.90
         8192          100         0.33     24719.24
        16384          100         0.47     35060.96
        32768          100         0.34     96788.00
        65536          100         0.35    189570.97
       131072          100         0.34    381774.87
       262144          100         0.35    758283.88
       524288           80         0.36   1441982.46
      1048576           40         0.42   2513169.43
      2097152           20         0.50   4188615.72
      4194304           10         0.67   6282923.59
#---------------------------------------------------
# Benchmarking Get_local
# #processes = 2
#---------------------------------------------------
#
#    MODE: AGGREGATE
#
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         0.04         0.00
            1         1000         0.30         3.31
            2         1000         0.29         6.80
            4         1000         0.29        13.60
            8         1000         0.29        27.21
           16         1000         0.31        52.10
           32         1000         0.33        97.97
           64         1000         0.34       190.11
          128         1000         0.35       368.48
          256         1000         0.35       739.49
          512         1000         0.34      1492.34
         1024         1000         0.36      2870.97
         2048         1000         0.39      5231.39
         4096         1000         0.56      7344.96
         8192         1000         1.06      7750.90
        16384         1000         2.07      7898.79
        32768         1000         4.12      7947.66
        65536          640         7.15      9166.89
       131072          320         7.80     16812.10
       262144          160         0.28    945816.45
       524288           80         0.33   1584881.63
      1048576           40         0.38   2792410.48
      2097152           20         0.43   4886718.35
      4194304           10         0.55   7648776.54


++++++++++++++++++++++++++++++++++

However, test cases with "mpirun_rsh", the same parameter could not be supplied, as shown below.
---
- TEST RESULT FOR mvapich2
-   Test:   mvapich2 IMB-MPI1 PingPong mpirun_rsh one_core
-   Result: FAIL
-   Return: 1
---
+ [20-03-23 13:58:53] for app in $(cat imb_mpi.txt)
+ [20-03-23 13:58:53] timeout --preserve-status --kill-after=5m 3m mpirun_rsh -np 2 -genv MV2_DEBUG_SHOW_BACKTRACE 1 -genv MV2_ENABLE_AFFINITY 0 -hostfile /root/hfile_one_core mpitests-IMB-MPI1 PingPing -time 1.5

mpirun_rsh: unrecognized option '-genv' <<<==========================================

usage: mpirun_rsh [-v] [-sg group] [-rsh|-ssh] [-debug] -[tv] [-xterm] [-show] [-legacy] [-export|-export-all] -np N (-hostfile hfile | h1 h2 ... hN) a.out args | -config configfile (-hostfile hfile | h1 h2 ... hN)]
Where:
        sg         => execute the processes as different group ID
        rsh        => to use rsh for connecting
        ssh        => to use ssh for connecting
        debug      => run each process under the control of gdb
        tv         => run each process under the control of totalview
        xterm      => run remote processes under xterm
        show       => show command for remote execution but don't run it
        legacy     => use old startup method (1 ssh/process)
        export     => automatically export environment to remote processes
        export-all => automatically export environment to remote processes even if already set remotely
        np         => specify the number of processes
        h1 h2...   => names of hosts where processes should run
or      hostfile   => name of file containing hosts, one per line
        a.out      => name of MPI binary
        args       => arguments for MPI binary
        config     => name of file containing the exe information: each line has the form -n numProc : exe args

Comment 8 Brian Chae 2020-03-23 20:10:41 UTC
Additional info:

"mvapich2" test suite on rdma-virt-00 / 01 pairs resulted in all of the test case with "mpirun" command passed.
However, all test cases with "mpirun_rsh" command resulted in failures as the following:

+ [20-03-19 06:45:54] timeout --preserve-status --kill-after=5m 3m mpirun_rsh -np 2 -hostfile /root/hfile_one_core mpitests-IMB-MPI1 Exchange -time 1.5
[rdma-virt-01.lab.bos.redhat.com:mpispawn_1][report_error] connect() failed: Connection refused (111)
[rdma-virt-00.lab.bos.redhat.com:mpispawn_0][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died?
[rdma-virt-00.lab.bos.redhat.com:mpispawn_0][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died?
[rdma-virt-00.lab.bos.redhat.com:mpispawn_0][handle_mt_peer] Error while reading PMI socket. MPI process died?
[rdma-virt-00.lab.bos.redhat.com:mpispawn_0][report_error] connect() failed: Connection refused (111)
[rdma-virt-01.lab.bos.redhat.com:mpirun_rsh][signal_processor] Caught signal 15, killing job
[rdma-virt-01.lab.bos.redhat.com:mpirun_rsh][signal_processor] Caught signal 15, killing job

This is yet another failure behavior on MLX4.

Comment 9 Brian Chae 2020-03-23 20:16:36 UTC
Created attachment 1672775 [details]
test log for all "mpirun" test cases passed; while all "mpirun_rsh" test cases failed

test log for all "mpirun" test cases passed; while all "mpirun_rsh" test cases failed. 

Ran on rdma-virt-00 / 01 host pair - MLX4 IB0

Comment 10 Honggang LI 2020-03-31 08:09:43 UTC
 /opt/mvapich2-2.3.3/bin/mpivars
[rdma-qe-06.lab.bos.redhat.com:mpi_rank_0][error_sighandler] Caught error: Floating point exception (signal 8)
Floating point exception (core dumped)

2728 static int mv2_generate_implicit_cpu_mapping (int local_procs, int num_app_threads) {
2729
2730     hwloc_obj_t obj;
2731
2732     int i, j, k, l, curr, count, chunk, size, scanned, step, node_offset, node_base_pu;
2733     int topodepth, num_physical_cores_per_socket ATTRIBUTE((unused)), num_pu_per_socket;
2734     int num_numanodes, num_pu_per_numanode;
2735     char mapping [s_cpu_mapping_line_max];
2736
2737     i = j = k = l = curr = count = chunk = size = scanned = step = node_offset = node_base_pu = 0;
2738     count = mv2_pivot_core_id;
2739
2740     /* call optimized topology load */
2741     smpi_load_hwloc_topology ();
2742
2743     num_sockets = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_SOCKET);
2744     num_numanodes = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NUMANODE);
2745
2746     num_physical_cores = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_CORE);
2747     num_pu = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PU);
2748
2749     num_physical_cores_per_socket = num_physical_cores / num_sockets;
2750     num_pu_per_socket = num_pu / num_sockets;
2751     num_pu_per_numanode = num_pu / num_numanodes;

In line 2744, hwloc_get_nbobjs_by_type return zero. line 2751 division
by zero. Floating point exception raise.

Because no mvapich2 upstream git repo available for us, I can't find which commit introduces this issue.
I checked upstream released tar balls mvapich2-2.3.tar.gz and mvapich2-2.3.2.tar.gz .

The older mvapich2-2.3.tar.gz don't have such code. After we replace mvapich2-2.3-5.el8 with mvapich2-2.3.2-2.el8,
this regression exposed.

Comment 15 Brian Chae 2020-09-03 11:16:36 UTC
The issue stated on this bugzilla is no longer observed on the latest RHEL8.3 buld for eSNAP #3.
The MVAPICH2 benchmarks ran/tested on the same RDMA hosts as the one when this issue was observed - rdma-qe-06 and rdma-qe-07.

DISTRO=RHEL-8.3.0-20200825.0

+ [20-09-03 05:58:16] echo 'Clients: rdma-qe-07'
Clients: rdma-qe-07
+ [20-09-03 05:58:16] echo 'Servers: rdma-qe-06'
Servers: rdma-qe-06


Installed:
  mpitests-mvapich2-5.6.2-1.el8.x86_64        mvapich2-2.3.3-1.el8.x86_64       



Test results for mpi/mvapich2 on rdma-qe-07:
4.18.0-234.el8.x86_64, rdma-core-29.0-3.el8, mlx5, ib0, & mlx5_0
    Result | Status | Test
  ---------+--------+------------------------------------
      PASS |      0 | mvapich2 IMB-MPI1 PingPong mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 PingPing mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Sendrecv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Exchange mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Bcast mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allgather mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allgatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Gather mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Gatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Scatter mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Scatterv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Alltoall mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Alltoallv mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Reduce mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Reduce_scatter mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allreduce mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 Barrier mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Write_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Read_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Write_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO S_Read_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_priv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_priv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_indv mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_expl mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_shared mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Window mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Unidir_Put mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Unidir_Get mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Bidir_Get mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Bidir_Put mpirun one_core
      PASS |      0 | mvapich2 IMB-EXT Accumulate mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ibcast mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iallgather mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iallgatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Igather mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Igatherv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iscatter mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iscatterv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ialltoall mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ialltoallv mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ireduce mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ireduce_scatter mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Iallreduce mpirun one_core
      PASS |      0 | mvapich2 IMB-NBC Ibarrier mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Unidir_put mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Unidir_get mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Bidir_put mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Bidir_get mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA One_put_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA One_get_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA All_put_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA All_get_all mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Put_local mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Put_all_local mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Exchange_put mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Exchange_get mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Accumulate mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Get_accumulate mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Fetch_and_op mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Compare_and_swap mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Get_local mpirun one_core
      PASS |      0 | mvapich2 IMB-RMA Get_all_local mpirun one_core
      PASS |      0 | mvapich2 IMB-MPI1 PingPong mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 PingPing mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Sendrecv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Exchange mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Bcast mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allgather mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allgatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Gather mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Gatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Scatter mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Scatterv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Alltoall mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Alltoallv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Reduce mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Reduce_scatter mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Allreduce mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-MPI1 Barrier mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO S_Write_indv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO S_Read_indv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO S_Write_expl mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO S_Read_expl mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_indv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_indv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_expl mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_expl mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_shared mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_shared mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Write_priv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO P_Read_priv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_indv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_indv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_expl mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_expl mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO C_Write_shared mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-IO C_Read_shared mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-EXT Window mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-EXT Unidir_Put mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-EXT Unidir_Get mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-EXT Bidir_Get mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-EXT Bidir_Put mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-EXT Accumulate mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Ibcast mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Iallgather mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Iallgatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Igather mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Igatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Iscatter mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Iscatterv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Ialltoall mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Ialltoallv mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Ireduce mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Ireduce_scatter mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Iallreduce mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-NBC Ibarrier mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Unidir_put mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Unidir_get mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Bidir_put mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Bidir_get mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA One_put_all mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA One_get_all mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA All_put_all mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA All_get_all mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Put_local mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Put_all_local mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Exchange_put mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Exchange_get mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Accumulate mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Get_accumulate mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Fetch_and_op mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Compare_and_swap mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Get_local mpirun_rsh one_core
      PASS |      0 | mvapich2 IMB-RMA Get_all_local mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU acc_latency mpirun one_core
      PASS |      0 | mvapich2 OSU allgather mpirun one_core
      PASS |      0 | mvapich2 OSU allgatherv mpirun one_core
      PASS |      0 | mvapich2 OSU allreduce mpirun one_core
      PASS |      0 | mvapich2 OSU alltoall mpirun one_core
      PASS |      0 | mvapich2 OSU alltoallv mpirun one_core
      PASS |      0 | mvapich2 OSU barrier mpirun one_core
      PASS |      0 | mvapich2 OSU bcast mpirun one_core
      PASS |      0 | mvapich2 OSU bibw mpirun one_core
      PASS |      0 | mvapich2 OSU bw mpirun one_core
      PASS |      0 | mvapich2 OSU cas_latency mpirun one_core
      PASS |      0 | mvapich2 OSU fop_latency mpirun one_core
      PASS |      0 | mvapich2 OSU gather mpirun one_core
      PASS |      0 | mvapich2 OSU gatherv mpirun one_core
      PASS |      0 | mvapich2 OSU get_acc_latency mpirun one_core
      PASS |      0 | mvapich2 OSU get_bw mpirun one_core
      PASS |      0 | mvapich2 OSU get_latency mpirun one_core
      PASS |      0 | mvapich2 OSU hello mpirun one_core
      PASS |      0 | mvapich2 OSU iallgather mpirun one_core
      PASS |      0 | mvapich2 OSU iallgatherv mpirun one_core
      PASS |      0 | mvapich2 OSU iallreduce mpirun one_core
      PASS |      0 | mvapich2 OSU ialltoall mpirun one_core
      PASS |      0 | mvapich2 OSU ialltoallv mpirun one_core
      PASS |      0 | mvapich2 OSU ialltoallw mpirun one_core
      PASS |      0 | mvapich2 OSU ibarrier mpirun one_core
      PASS |      0 | mvapich2 OSU ibcast mpirun one_core
      PASS |      0 | mvapich2 OSU igather mpirun one_core
      PASS |      0 | mvapich2 OSU igatherv mpirun one_core
      PASS |      0 | mvapich2 OSU init mpirun one_core
      PASS |      0 | mvapich2 OSU ireduce mpirun one_core
      PASS |      0 | mvapich2 OSU iscatter mpirun one_core
      PASS |      0 | mvapich2 OSU iscatterv mpirun one_core
      PASS |      0 | mvapich2 OSU latency mpirun one_core
      PASS |      0 | mvapich2 OSU mbw_mr mpirun one_core
      PASS |      0 | mvapich2 OSU multi_lat mpirun one_core
      PASS |      0 | mvapich2 OSU put_bibw mpirun one_core
      PASS |      0 | mvapich2 OSU put_bw mpirun one_core
      PASS |      0 | mvapich2 OSU put_latency mpirun one_core
      PASS |      0 | mvapich2 OSU reduce mpirun one_core
      PASS |      0 | mvapich2 OSU reduce_scatter mpirun one_core
      PASS |      0 | mvapich2 OSU scatter mpirun one_core
      PASS |      0 | mvapich2 OSU scatterv mpirun one_core
      PASS |      0 | mvapich2 OSU acc_latency mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU allgather mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU allgatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU allreduce mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU alltoall mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU alltoallv mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU barrier mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU bcast mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU bibw mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU bw mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU cas_latency mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU fop_latency mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU gather mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU gatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU get_acc_latency mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU get_bw mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU get_latency mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU hello mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU iallgather mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU iallgatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU iallreduce mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU ialltoall mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU ialltoallv mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU ialltoallw mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU ibarrier mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU ibcast mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU igather mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU igatherv mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU init mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU ireduce mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU iscatter mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU iscatterv mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU latency mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU mbw_mr mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU multi_lat mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU put_bibw mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU put_bw mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU put_latency mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU reduce mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU reduce_scatter mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU scatter mpirun_rsh one_core
      PASS |      0 | mvapich2 OSU scatterv mpirun_rsh one_core

Checking for failures and known issues:
  no test failures

all looks good.

Comment 18 errata-xmlrpc 2020-11-04 01:37:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rdma-core bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4456


Note You need to log in before you can comment on or make changes to this bug.