Bug 1878829
Summary: | [RHEL8.3] all MVAPICH2 benchmarks fail with RC 1 when run with "mpirun_rsh" on certain RDMA HCAs | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Brian Chae <bchae> | ||||
Component: | mvapich2 | Assignee: | Honggang LI <honli> | ||||
Status: | CLOSED UPSTREAM | QA Contact: | Infiniband QE <infiniband-qe> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.3 | CC: | hwkernel-mgr, rdma-dev-team | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-10-10 03:05:44 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
(In reply to Brian Chae from comment #0) > *** This issue has been observed only on the following RDMA lab hosts. > > - rdma-virt-02/03 > PowerEdge R430 - mlx5 MT27700 CX-4 ib0/ib1 I can't get access virt-02/03 at this moment. > - rdma-dev-21/22 > PowerEdge R630 - mlx5 MT27700 CX-4 ib0/ib1 > - rdma-virt-00/01 > mlx4 MT27520 CX-3Pro ib0/ib1 I checked those machines. They are have RoCE and IB HCAs. To run mvapich2 over RoCE, please use two environment variables. It's a known issue. [root@rdma-dev-21 ~]$ grep -i distro /etc/motd DISTRO=RHEL-8.4.0-20200914.n.0 Job Whiteboard: Reserve Workflow provision of distro RHEL-8.4.0-20200914.n.0 on a specific system for 86400 seconds [root@rdma-dev-21 ~]$ [root@rdma-dev-21 ~]$ rpm -q mvapich2 mvapich2-2.3.3-1.el8.x86_64 [root@rdma-dev-21 ~]$ [root@rdma-dev-21 ~]$ mpirun_rsh -np 2 -hostfile /root/hfile_one_core MV2_IBA_HCA=mlx5_0 MV2_USE_RoCE=1 mpitests-IMB-MPI1 Allgatherv #------------------------------------------------------------ # Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part #------------------------------------------------------------ # Date : Tue Sep 15 06:49:54 2020 # Machine : x86_64 # System : Linux # Release : 4.18.0-235.el8.x86_64 # Version : #1 SMP Thu Sep 3 10:48:30 EDT 2020 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # mpitests-IMB-MPI1 Allgatherv # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Allgatherv #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.05 0.06 0.06 1 1000 1.23 2.25 1.74 2 1000 1.46 2.03 1.74 4 1000 0.67 2.80 1.74 8 1000 0.67 2.80 1.74 16 1000 0.68 2.81 1.74 32 1000 1.70 1.80 1.75 64 1000 1.36 2.18 1.77 128 1000 1.79 2.45 2.12 256 1000 1.18 3.26 2.22 512 1000 1.38 3.31 2.34 1024 1000 1.27 3.36 2.32 2048 1000 2.21 2.84 2.52 4096 1000 2.45 4.34 3.39 8192 1000 3.70 5.82 4.76 16384 1000 9.18 9.51 9.35 32768 1000 12.49 12.55 12.52 65536 640 19.07 19.58 19.32 131072 320 36.16 36.47 36.32 262144 160 66.81 67.42 67.11 524288 80 125.33 127.12 126.23 1048576 40 467.54 468.40 467.97 2097152 20 953.67 954.22 953.95 4194304 10 2076.20 2077.48 2076.84 # All processes entering MPI_Finalize [1] Failed to dealloc pd (Device or resource busy) [0] Failed to dealloc pd (Device or resource busy) [0] 16 at [0x000055a3a45fe5a0], src/mpid/ch3/src/mpid_rma.c[182] [0] 56 at [0x000055a3a4ce1ec0], src/mpi/coll/ch3_shmem_coll.c[4040] [0] 8 at [0x000055a3a4ce0720], src/mpi/comm/create_2level_comm.c[1058] [0] 24 at [0x000055a3a4ce3ed0], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055a3a4c87fc0], src/mpi/comm/create_2level_comm.c[1016] [0] 24 at [0x000055a3a4cdf760], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055a3a4ce1980], src/mpi/comm/create_2level_comm.c[743] [0] 8 at [0x000055a3a4ce33c0], src/util/procmap/local_proc.c[93] [0] 8 at [0x000055a3a4cdece0], src/util/procmap/local_proc.c[92] [0] 16 at [0x000055a3a4c40620], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055a3a4ce2400], src/util/procmap/local_proc.c[93] [0] 8 at [0x000055a3a4ce2940], src/util/procmap/local_proc.c[92] [0] 1024 at [0x000055a3a45f9920], src/mpi/coll/ch3_shmem_coll.c[4783] [0] 8 at [0x000055a3a45fe040], src/mpi/coll/ch3_shmem_coll.c[4779] [0] 312 at [0x000055a3a45fe3c0], src/mpi/coll/ch3_shmem_coll.c[4732] [0] 208 at [0x000055a3a45fe250], src/mpi/coll/ch3_shmem_coll.c[4682] [0] 8 at [0x000055a3a45fdf90], src/mpi/comm/create_2level_comm.c[1607] [0] 8 at [0x000055a3a4601cd0], src/mpi/comm/create_2level_comm.c[1599] [0] 8 at [0x000055a3a45fe1a0], src/util/procmap/local_proc.c[93] [0] 8 at [0x000055a3a45fe0f0], src/util/procmap/local_proc.c[92] [0] 16 at [0x000055a3a4601ab0], src/mpi/group/grouputil.c[74] [0] 24 at [0x000055a3a4601c10], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055a3a4601b60], src/mpi/comm/create_2level_comm.c[1502] [0] 8 at [0x000055a3a4601a00], src/mpi/comm/create_2level_comm.c[1478] [0] 24 at [0x000055a3a4c87ce0], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055a3a45fb430], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055a3a4601f70], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055a3a4601ec0], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055a3a4601e10], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055a3a45fc590], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055a3a45f60a0], src/mpid/ch3/src/mpid_rma.c[182] [0] 504 at [0x000055a3a4778cd0], src/mpi/comm/commutil.c[328] [0] 32 at [0x000055a3a45d90b0], src/mpid/ch3/src/mpid_vc.c[110] [1] 56 at [0x000056210d2d85b0], src/mpi/coll/ch3_shmem_coll.c[4023] [1] 8 at [0x000056210d2d9570], src/mpi/comm/create_2level_comm.c[1058] [1] 24 at [0x000056210cc19f70], src/mpi/group/grouputil.c[74] [1] 8 at [0x000056210d2dd7a0], src/mpi/comm/create_2level_comm.c[1016] [1] 24 at [0x000056210d2db790], src/mpi/group/grouputil.c[74] [1] 8 at [0x000056210d2d9ab0], src/mpi/comm/create_2level_comm.c[743] [1] 8 at [0x000056210d2dc750], src/util/procmap/local_proc.c[93] [1] 8 at [0x000056210d2dcc90], src/util/procmap/local_proc.c[92] [1] 24 at [0x000056210d2d9030], src/mpid/ch3/src/mpid_vc.c[110] [1] 16 at [0x000056210cbf0450], src/mpi/group/grouputil.c[74] [1] 8 at [0x000056210d2dbcd0], src/util/procmap/local_proc.c[93] [1] 8 at [0x000056210d2dc210], src/util/procmap/local_proc.c[92] [1] 1024 at [0x000056210cf367e0], src/mpi/coll/ch3_shmem_coll.c[4783] [1] 8 at [0x000056210cbf3e80], src/mpi/coll/ch3_shmem_coll.c[4779] [1] 312 at [0x000056210cbf3ca0], src/mpi/coll/ch3_shmem_coll.c[4732] [1] 208 at [0x000056210cbf3b30], src/mpi/coll/ch3_shmem_coll.c[4682] [1] 8 at [0x000056210cbfbb70], src/mpi/comm/create_2level_comm.c[1607] [1] 8 at [0x000056210cbf3920], src/mpi/comm/create_2level_comm.c[1599] [1] 8 at [0x000056210cbf3a80], src/util/procmap/local_proc.c[93] [1] 8 at [0x000056210cbf39d0], src/util/procmap/local_proc.c[92] [1] 24 at [0x000056210cbfbcf0], src/mpid/ch3/src/mpid_vc.c[110] [1] 16 at [0x000056210d281fa0], src/mpi/group/grouputil.c[74] [1] 24 at [0x000056210cbfbab0], src/mpi/group/grouputil.c[74] [1] 8 at [0x000056210cbfba00], src/mpi/comm/create_2level_comm.c[1502] [1] 8 at [0x000056210d281ef0], src/mpi/comm/create_2level_comm.c[1478] [1] 24 at [0x000056210cbf84a0], src/mpi/group/grouputil.c[74] [1] 8 at [0x000056210cbf5430], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000056210cbfbf70], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000056210cbfbec0], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000056210cbfbe10], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000056210cbf6590], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000056210cbf00a0], src/mpid/ch3/src/mpid_rma.c[182] [1] 504 at [0x000056210cd72cd0], src/mpi/comm/commutil.c[328] [1] 32 at [0x000056210cbd30b0], src/mpid/ch3/src/mpid_vc.c[110] To run mvapich2 over system with multiple HCA or RoCE HCA, these parameters MV2_IBA_HCA, MV2_USE_RoCE are needed. |
Created attachment 1714821 [details] client test log for mvapich2 where all mpirun_rsh benchmarks failed Description of problem: All benchmarks of MVAPICH2 fail with RC1 when run with "mpirun_rsh", the error messages for each benchmark are shown as below: + [20-09-04 18:26:59] timeout --preserve-status --kill-after=5m 3m mpirun_rsh -np 2 -hostfile /root/hfile_one_core mpitests-IMB-MPI1 Allgatherv -time 1.5 [src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c:1672] Could not modify qpto RTR [src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c:1672] Could not modify qpto RTR #------------------------------------------------------------ # Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part #------------------------------------------------------------ # Date : Fri Sep 4 18:27:00 2020 # Machine : x86_64 # System : Linux # Release : 4.18.0-234.el8.x86_64 # Version : #1 SMP Thu Aug 20 10:25:32 EDT 2020 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # mpitests-IMB-MPI1 Allgatherv -time 1.5 # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Allgatherv [rdma-dev-22.lab.bos.redhat.com:mpispawn_1][report_error] connect() failed: Connection refused (111) [rdma-dev-21.lab.bos.redhat.com:mpispawn_0][report_error] connect() failed: Connection refused (111) [rdma-dev-22.lab.bos.redhat.com:mpirun_rsh][signal_processor] Caught signal 15, killing job *** This issue has been observed only on the following RDMA lab hosts. - rdma-virt-02/03 PowerEdge R430 - mlx5 MT27700 CX-4 ib0/ib1 - rdma-dev-21/22 PowerEdge R630 - mlx5 MT27700 CX-4 ib0/ib1 - rdma-virt-00/01 mlx4 MT27520 CX-3Pro ib0/ib1 *** NO such issues observed and all MVAPICH2 benchmarks successfully run on the following RDMA hosts. - rdma-qe-06/rdma-qe-07 mlx5 MT27600 CIB ib0/ib1 - rdma-dev-10/11 mlx4 MT27500 CX-3 ib0/ib1 - rdma-dev-00/01 mlx4 MT27500 CX-3 ib0/ib1 - rdma-perf-00/01 mlx4 MT27500 CX-3 ib0 Version-Release number of selected component (if applicable): DISTRO=RHEL-8.3.0-20200825.0 + [20-09-04 14:32:38] cat /etc/redhat-release Red Hat Enterprise Linux release 8.3 Beta (Ootpa) + [20-09-04 14:32:38] uname -a Linux rdma-dev-22.lab.bos.redhat.com 4.18.0-234.el8.x86_64 #1 SMP Thu Aug 20 10:25:32 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux + [20-09-04 14:32:38] cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-234.el8.x86_64 root=/dev/mapper/rhel_rdma--dev--22-root ro intel_idle.max_cstate=0 processor.max_cstate=0 intel_iommu=on iommu=on console=tty0 rd_NO_PLYMOUTH crashkernel=auto resume=/dev/mapper/rhel_rdma--dev--22-swap rd.lvm.lv=rhel_rdma-dev-22/root rd.lvm.lv=rhel_rdma-dev-22/swap console=ttyS1,115200n81 + [20-09-04 14:32:38] rpm -q rdma-core linux-firmware rdma-core-29.0-3.el8.x86_64 linux-firmware-20200619-99.git3890db36.el8.noarch + [20-09-04 14:32:38] tail /sys/class/infiniband/mlx5_0/fw_ver /sys/class/infiniband/mlx5_1/fw_ver /sys/class/infiniband/mlx5_2/fw_ver ==> /sys/class/infiniband/mlx5_0/fw_ver <== 12.16.1020 ==> /sys/class/infiniband/mlx5_1/fw_ver <== 12.23.1020 ==> /sys/class/infiniband/mlx5_2/fw_ver <== 12.23.1020 + [20-09-04 14:32:38] lspci + [20-09-04 14:32:38] grep -i -e ethernet -e infiniband -e omni -e ConnectX 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 04:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] 82:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] 82:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] Installed: mpitests-mvapich2-5.6.2-1.el8.x86_64 mvapich2-2.3.3-1.el8.x86_64 How reproducible: 100% on the above specified HCAs/RDMA hosts Steps to Reproduce: 1. On the client host timeout --preserve-status --kill-after=5m 3m mpirun_rsh -np 2 -hostfile /root/hfile_one_core mpitests-IMB-MPI1 Sendrecv -time 1.5 Actual results: [src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c:1672] Could not modify qpto RTR [src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c:1672] Could not modify qpto RTR #------------------------------------------------------------ # Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part #------------------------------------------------------------ # Date : Fri Sep 4 18:14:56 2020 # Machine : x86_64 # System : Linux # Release : 4.18.0-234.el8.x86_64 # Version : #1 SMP Thu Aug 20 10:25:32 EDT 2020 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # mpitests-IMB-MPI1 Sendrecv -time 1.5 # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Sendrecv [rdma-dev-22.lab.bos.redhat.com:mpispawn_1][report_error] connect() failed: Connection refused (111) [rdma-dev-21.lab.bos.redhat.com:mpispawn_0][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died? [rdma-dev-21.lab.bos.redhat.com:mpispawn_0][read_size] Unexpected End-Of-File on file descriptor 6. MPI process died? [rdma-dev-21.lab.bos.redhat.com:mpispawn_0][handle_mt_peer] Error while reading PMI socket. MPI process died? [rdma-dev-21.lab.bos.redhat.com:mpispawn_0][report_error] connect() failed: Connection refused (111) [rdma-dev-22.lab.bos.redhat.com:mpirun_rsh][signal_processor] Caught signal 15, killing job [rdma-dev-22.lab.bos.redhat.com:mpirun_rsh][signal_processor] Caught signal 15, killing job Expected results: + [20-09-05 11:36:03] timeout --preserve-status --kill-after=5m 3m mpirun_rsh -np 2 -hostfile /root/hfile_one_core mpitests-IMB-MPI1 Sendrecv -time 1.5 #------------------------------------------------------------ # Intel(R) MPI Benchmarks 2019 Update 6, MPI-1 part #------------------------------------------------------------ # Date : Sat Sep 5 11:36:04 2020 # Machine : x86_64 # System : Linux # Release : 4.18.0-234.el8.x86_64 # Version : #1 SMP Thu Aug 20 10:25:32 EDT 2020 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # mpitests-IMB-MPI1 Sendrecv -time 1.5 # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Sendrecv #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 1.79 1.79 1.79 0.00 1 1000 1.98 1.98 1.98 1.01 2 1000 1.90 1.90 1.90 2.11 4 1000 1.90 1.90 1.90 4.21 8 1000 1.92 1.92 1.92 8.34 16 1000 1.91 1.91 1.91 16.76 32 1000 1.96 1.96 1.96 32.62 64 1000 2.02 2.02 2.02 63.50 128 1000 2.11 2.11 2.11 121.34 256 1000 2.92 2.92 2.92 175.32 512 1000 3.09 3.09 3.09 331.33 1024 1000 3.48 3.48 3.48 588.27 2048 1000 4.38 4.39 4.39 933.99 4096 1000 6.31 6.31 6.31 1297.48 8192 1000 9.94 9.95 9.94 1647.44 16384 1000 11.63 11.63 11.63 2818.16 32768 1000 16.19 16.19 16.19 4048.87 65536 640 19.85 19.85 19.85 6604.17 131072 320 31.45 31.45 31.45 8334.96 262144 160 52.86 52.87 52.87 9916.96 524288 80 97.16 97.16 97.16 10791.76 1048576 40 187.76 187.77 187.76 11168.93 2097152 20 388.82 388.86 388.84 10786.13 4194304 10 808.84 809.53 809.18 10362.36 # All processes entering MPI_Finalize [1] Failed to dealloc pd (Device or resource busy) [0] Failed to dealloc pd (Device or resource busy) [0] 16 at [0x000055b56e58fce0], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055b56e5fe9c0], src/mpi/comm/create_2level_comm.c[1058] [0] 24 at [0x000055b56e5fad60], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055b56e5fef00], src/mpi/comm/create_2level_comm.c[1016] [0] 56 at [0x000055b56e5fc260], src/mpi/coll/ch3_shmem_coll.c[4040] [0] 24 at [0x000055b56e5fb7e0], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055b56e5fcf80], src/mpi/comm/create_2level_comm.c[743] [0] 8 at [0x000055b56e5fa2e0], src/util/procmap/local_proc.c[93] [0] 8 at [0x000055b56e5fa820], src/util/procmap/local_proc.c[92] [0] 16 at [0x000055b56e5eae90], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055b56e5fdf40], src/util/procmap/local_proc.c[93] [0] 8 at [0x000055b56e5fe480], src/util/procmap/local_proc.c[92] [0] 1024 at [0x000055b56e050f50], src/mpi/coll/ch3_shmem_coll.c[4783] [0] 8 at [0x000055b56e24ede0], src/mpi/coll/ch3_shmem_coll.c[4779] [0] 312 at [0x000055b56e050d70], src/mpi/coll/ch3_shmem_coll.c[4732] [0] 208 at [0x000055b56e050c00], src/mpi/coll/ch3_shmem_coll.c[4682] [0] 8 at [0x000055b56e24ed30], src/mpi/comm/create_2level_comm.c[1607] [0] 8 at [0x000055b56e24ec80], src/mpi/comm/create_2level_comm.c[1599] [0] 8 at [0x000055b56e24ef40], src/util/procmap/local_proc.c[93] [0] 8 at [0x000055b56e24ee90], src/util/procmap/local_proc.c[92] [0] 16 at [0x000055b56e58fe40], src/mpi/group/grouputil.c[74] [0] 24 at [0x000055b56e58ffa0], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055b56e58fef0], src/mpi/comm/create_2level_comm.c[1502] [0] 8 at [0x000055b56e58fd90], src/mpi/comm/create_2level_comm.c[1478] [0] 24 at [0x000055b56e0499e0], src/mpi/group/grouputil.c[74] [0] 8 at [0x000055b56e24e980], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055b56e24e8d0], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055b56e24e820], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055b56e02d6c0], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055b56e02e200], src/mpid/ch3/src/mpid_rma.c[182] [0] 8 at [0x000055b56e02e020], src/mpid/ch3/src/mpid_rma.c[182] [0] 504 at [0x000055b56e051920], src/mpi/comm/commutil.c[328] [0] 32 at [0x000055b56e02e9e0], src/mpid/ch3/src/mpid_vc.c[110] [1] 8 at [0x000055cc306d2990], src/mpi/comm/create_2level_comm.c[1058] [1] 24 at [0x000055cc306d2ed0], src/mpi/group/grouputil.c[74] [1] 8 at [0x000055cc306d3e90], src/mpi/comm/create_2level_comm.c[1016] [1] 56 at [0x000055cc306d43d0], src/mpi/coll/ch3_shmem_coll.c[4023] [1] 24 at [0x000055cc306d5630], src/mpi/group/grouputil.c[74] [1] 8 at [0x000055cc30128300], src/mpi/comm/create_2level_comm.c[743] [1] 8 at [0x000055cc306d3410], src/util/procmap/local_proc.c[93] [1] 8 at [0x000055cc306d3950], src/util/procmap/local_proc.c[92] [1] 24 at [0x000055cc306d4910], src/mpid/ch3/src/mpid_vc.c[110] [1] 16 at [0x000055cc306d7b80], src/mpi/group/grouputil.c[74] [1] 8 at [0x000055cc306d6b30], src/util/procmap/local_proc.c[93] [1] 8 at [0x000055cc306d7070], src/util/procmap/local_proc.c[92] [1] 1024 at [0x000055cc30129c00], src/mpi/coll/ch3_shmem_coll.c[4783] [1] 8 at [0x000055cc30122630], src/mpi/coll/ch3_shmem_coll.c[4779] [1] 312 at [0x000055cc30122450], src/mpi/coll/ch3_shmem_coll.c[4732] [1] 208 at [0x000055cc301222e0], src/mpi/coll/ch3_shmem_coll.c[4682] [1] 8 at [0x000055cc30327d40], src/mpi/comm/create_2level_comm.c[1607] [1] 8 at [0x000055cc30327f80], src/mpi/comm/create_2level_comm.c[1599] [1] 8 at [0x000055cc30122230], src/util/procmap/local_proc.c[93] [1] 8 at [0x000055cc30122180], src/util/procmap/local_proc.c[92] [1] 24 at [0x000055cc30327ec0], src/mpid/ch3/src/mpid_vc.c[110] [1] 16 at [0x000055cc30668ef0], src/mpi/group/grouputil.c[74] [1] 24 at [0x000055cc30327c80], src/mpi/group/grouputil.c[74] [1] 8 at [0x000055cc30668fa0], src/mpi/comm/create_2level_comm.c[1502] [1] 8 at [0x000055cc30668e40], src/mpi/comm/create_2level_comm.c[1478] [1] 24 at [0x000055cc3011d790], src/mpi/group/grouputil.c[74] [1] 8 at [0x000055cc30327980], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000055cc303278d0], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000055cc30327820], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000055cc301066c0], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000055cc30107200], src/mpid/ch3/src/mpid_rma.c[182] [1] 8 at [0x000055cc30107020], src/mpid/ch3/src/mpid_rma.c[182] [1] 504 at [0x000055cc3012a920], src/mpi/comm/commutil.c[328] [1] 32 at [0x000055cc301079e0], src/mpid/ch3/src/mpid_vc.c[110] Additional info: