Bug 2095307
| Summary: | [RHEL9.1] all mvapich2 benchmarks fail with "create qp: failed on ibv_cmd_create_qp with 22" error on QEDR IW / ROCE device | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Brian Chae <bchae> |
| Component: | mvapich2 | Assignee: | Kamal Heib <kheib> |
| Status: | ASSIGNED --- | QA Contact: | Infiniband QE <infiniband-qe> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.1 | CC: | hwkernel-mgr, kheib, rdma-dev-team |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Brian Chae
2022-06-09 13:56:29 UTC
With RHEL-9.3.0 build same kind of return code but different failure cause as shown were observed. + [23-07-25 15:34:49] timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 mpitests-IMB-MPI1 PingPong -time 1.5 [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:mpi_rank_0][rdma_param_handle_heterogeneity] All nodes involved in the job were detected to be homogeneous in terms of processors and interconnects. Setting MV2_HOMOGENEOUS_CLUSTER=1 can improve job startup performance on such systems. The following link has more details on enhancing job startup performance. http://mvapich.cse.ohio-state.edu/performance/job-startup/. [rdma-dev-02.rdma.lab.eng.rdu2.redhat.com:mpi_rank_0][rdma_param_handle_heterogeneity] To suppress this warning, please set MV2_SUPPRESS_JOB_STARTUP_PERFORMANCE_WARNING to 1 Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(493)....: MPID_Init(419)...........: channel initialization failed MPIDI_CH3_Init(601)......: MPIDI_CH3I_RDMA_init(446): rdma_iba_hca_init(1775)..: Failed to retrieve gid on rank 1 [cli_1]: aborting job: Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(493)....: MPID_Init(419)...........: channel initialization failed MPIDI_CH3_Init(601)......: MPIDI_CH3I_RDMA_init(446): rdma_iba_hca_init(1775)..: Failed to retrieve gid on rank 1 + [23-07-25 15:34:49] __MPI_check_result 143 mpitests-mvapich2 IMB-MPI1 PingPong mpirun /root/hfile_one_core |