Bug 279861

Summary: OpenMPI issues when both nodes are Qlogic PCIe cards
Product: Red Hat Enterprise Linux 5 Reporter: Gurhan Ozen <gozen>
Component: openmpiAssignee: Doug Ledford <dledford>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.1CC: jburke
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-21 17:53:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gurhan Ozen 2007-09-05 23:54:29 UTC
Description of problem:
When running mpitests-IMB_MPI1 testsuite of mpitests package over 2 nodes where
both nodes are  InfiniPath_QLE7140 HCAs, the program crashes with the following
backtrace:
[dell-pe1950-03.rhts.boston.redhat.com][0,1,0][btl_openib_endpoint.c:213:mca_btl_openib_endpoint_post_send]
error posting send request errno says Invalid argument

[0,1,0][btl_openib_component.c:1332:btl_openib_component_progress] from
dell-pe1950-03.rhts.boston.redhat.com to: dell-pe1950-02.rhts.boston.redhat.com
error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id
173782656 opcode 1
--------------------------------------------------------------------------
The InfiniBand retry count between two MPI processes has been
exceeded.  "Retry count" is defined in the InfiniBand spec 1.2
(section 12.7.38):

    The total number of times that the sender wishes the receiver to
    retry timeout, packet sequence, etc. errors before posting a
    completion error.

This error typically means that there is something awry within the
InfiniBand fabric itself.  You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.  

Two MCA parameters can be used to control Open MPI's behavior with
respect to the retry count:

* btl_openib_ib_retry_count - The number of times the sender will
  attempt to retry (defaulted to 7, the maximum value).

* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
  to 10).  The actual timeout value used is calculated as:

     4.096 microseconds * (2^btl_openib_ib_timeout)

  See the InfiniBand spec 1.2 (section 12.7.34) for more details.
--------------------------------------------------------------------------
mpirun noticed that job rank 1 with PID 4844 on node
dell-pe1950-02.rhts.boston.redhat.com exited on signal 15 (Terminated). 


Version-Release number of selected component (if applicable):
# rpm -qa | egrep "openib|openmpi|mpitests"
openmpi-devel-1.2.3-4.el5
openib-srptools-0.0.6-5.el5
openib-mstflint-1.2-5.el5
openmpi-1.2.3-4.el5
mpitests-debuginfo-2.0-2
openib-1.2-5.el5
openib-diags-1.2.7-5.el5
openib-perftest-1.2-5.el5
openib-debuginfo-1.2-5.el5
openmpi-libs-1.2.3-4.el5
mpitests-2.0-2
openib-tvflash-0.9.2-5.el5
openmpi-debuginfo-1.2.3-4.el5
# modinfo ib_ipath
filename:      
/lib/modules/2.6.18-43.el5/kernel/drivers/infiniband/hw/ipath/ib_ipath.ko
description:    QLogic InfiniPath driver
author:         QLogic <support>
license:        GPL
srcversion:     60096FEC902AEF4EEFCAD65
alias:          pci:v00001FC1d00000010sv*sd*bc*sc*i*
alias:          pci:v00001FC1d0000000Dsv*sd*bc*sc*i*
depends:        ib_core
vermagic:       2.6.18-43.el5 SMP mod_unload gcc-4.1
parm:           qp_table_size:QP table size (uint)
parm:           lkey_table_size:LKEY table size in bits (2^n, 1 <= n <= 23) (uint)
parm:           max_pds:Maximum number of protection domains to support (uint)
parm:           max_ahs:Maximum number of address handles to support (uint)
parm:           max_cqes:Maximum number of completion queue entries to support
(uint)
parm:           max_cqs:Maximum number of completion queues to support (uint)
parm:           max_qp_wrs:Maximum number of QP WRs to support (uint)
parm:           max_qps:Maximum number of QPs to support (uint)
parm:           max_sges:Maximum number of SGEs to support (uint)
parm:           max_mcast_grps:Maximum number of multicast groups to support (uint)
parm:           max_mcast_qp_attached:Maximum number of attached QPs to support
(uint)
parm:           max_srqs:Maximum number of SRQs to support (uint)
parm:           max_srq_sges:Maximum number of SRQ SGEs to support (uint)
parm:           max_srq_wrs:Maximum number of SRQ WRs support (uint)
parm:           disable_sma:uint
parm:           ib_ipath_disable_sma:Disable the SMA
parm:           cfgports:Set max number of ports to use (ushort)
parm:           kpiobufs:Set number of PIO buffers for driver
parm:           debug:mask for debug prints (uint)
module_sig:    
883f35046cb61a956bc506ef7b1fb11243e309e23f9e3a61fda5c94159875195a459bc1f34c40a09ee073e5d729c3b0816f6bf372e92c1b46efefe



How reproducible:
Everytime

Steps to Reproduce:
1. Have 2 nodes with Qlogic PCIe cards (I used InfiniPath_QLE7140)
2. Build, install mpitests.
3. Run mpitests-IMB_MPI1 .
  
Actual results:


Expected results:


Additional info:

Comment 1 Doug Ledford 2007-09-21 17:53:48 UTC
It turns out that this problem, specifically the segfault, wasn't related to two
ipath cards as it was that one of the ipath cards was running at a scant 2% of
the overall speed of the IB fabric and as such was causing excessive retries
that closed the connection and the mpitest program wasn't built to deal with
connections going away unexpectedly and as a result segfaulted.  As such, I'm
closing this as NOTABUG.  The problem went away when the test network was
updated to get rid of the slow connection that was disrupting the IB fabric.