Bug 276111

Summary: [RHEL5 RT][OPENIB] IPoIB crashes ib_mthca module on MT25204 HCAs
Product: Red Hat Enterprise MRG Reporter: Gurhan Ozen <gozen>
Component: realtime-kernelAssignee: Doug Ledford <dledford>
Severity: low Docs Contact:
Priority: medium    
Version: 1.0CC: jburke
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-08 14:35:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gurhan Ozen 2007-09-04 13:24:56 UTC
Description of problem:
  When trying to test IPoIB with iperf program, one of the HCAs has become
unusable and, though it wasn't verbose, ib_mthca crashed. 

  After running iperf program, running ibstat hanged and gave this error:

ibstat: ibpanic: [4235] main: stat of IB device 'mthca0' failed: (Device or
resource busy) 
  Trying to reload ib_mthca to remedy problem, rmmod ib_mthca hung as well and
gave this error:

  kernel: ib_mthca 0000:0c:00.0: HW2SW_MPT failed (-16)

Version-Release number of selected component (if applicable):
[root@dell-pe1950-02 ~]# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-37.el5rt #1 SMP PREEMPT RT
Thu Aug 30 16:05:41 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[root@dell-pe1950-02 ~]# modinfo ib_mthca
version:        0.08
license:        Dual BSD/GPL
description:    Mellanox InfiniBand HCA low-level driver
author:         Roland Dreier
srcversion:     DBCBEAE0F96BE105E037FE4
alias:          pci:v00001867d00005E8Csv*sd*bc*sc*i*
alias:          pci:v000015B3d00005E8Csv*sd*bc*sc*i*
alias:          pci:v00001867d00006274sv*sd*bc*sc*i*
alias:          pci:v000015B3d00006274sv*sd*bc*sc*i*
alias:          pci:v00001867d00006282sv*sd*bc*sc*i*
alias:          pci:v000015B3d00006282sv*sd*bc*sc*i*
alias:          pci:v00001867d00006278sv*sd*bc*sc*i*
alias:          pci:v000015B3d00006278sv*sd*bc*sc*i*
alias:          pci:v00001867d00005A44sv*sd*bc*sc*i*
alias:          pci:v000015B3d00005A44sv*sd*bc*sc*i*
depends:        ib_mad,ib_core
vermagic:       2.6.21-37.el5rt SMP preempt mod_unload 
parm:           catas_reset_disable:disable reset on catastrophic event if
nonzero (int)
parm:           qos_support:Enable QoS support if > 0 (int)
parm:           fw_cmd_doorbell:post FW commands through doorbell page if
nonzero (and supported by FW) (int)
parm:           debug_level:Enable debug tracing if > 0 (int)
parm:           msi_x:attempt to use MSI-X if nonzero (int)
parm:           msi:attempt to use MSI if nonzero (int)
parm:           tune_pci:increase PCI burst from the default set by BIOS if
nonzero (int)
parm:           num_qp:maximum number of QPs per HCA (int)
parm:           rdb_per_qp:number of RDB buffers per QP (int)
parm:           num_cq:maximum number of CQs per HCA (int)
parm:           num_mcg:maximum number of multicast groups per HCA (int)
parm:           num_mpt:maximum number of memory protection table entries per
HCA (int)
parm:           num_mtt:maximum number of memory translation table segments per
HCA (int)
parm:           num_udav:maximum number of UD address vectors per HCA (int)
parm:           fmr_reserved_mtts:number of memory translation table segments
reserved for FMR (int)

How reproducible:

Steps to Reproduce:
1. Run a network performance program to the ib0 interface.

Additional info:
  This only happened with MT25204 cards so far. MT25208 doesn't seem to have
this issue. I am yet to test with qlogic cards, will post the results about them
as well.

Comment 1 Gurhan Ozen 2007-10-08 14:35:30 UTC
Marking this as duplicate of BZ #251934 . If there are separate release notes
for RT kernel, then the release on #251934 should be included for RT as well. 

*** This bug has been marked as a duplicate of 251934 ***