Bug 276111 - [RHEL5 RT][OPENIB] IPoIB crashes ib_mthca module on MT25204 HCAs
Summary: [RHEL5 RT][OPENIB] IPoIB crashes ib_mthca module on MT25204 HCAs
Keywords:
Status: CLOSED DUPLICATE of bug 251934
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: All
OS: All
medium
low
Target Milestone: ---
: ---
Assignee: Doug Ledford
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-09-04 13:24 UTC by Gurhan Ozen
Modified: 2008-02-27 19:56 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-08 14:35:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gurhan Ozen 2007-09-04 13:24:56 UTC
Description of problem:
  When trying to test IPoIB with iperf program, one of the HCAs has become
unusable and, though it wasn't verbose, ib_mthca crashed. 

  After running iperf program, running ibstat hanged and gave this error:

ibstat: ibpanic: [4235] main: stat of IB device 'mthca0' failed: (Device or
resource busy) 
 
  Trying to reload ib_mthca to remedy problem, rmmod ib_mthca hung as well and
gave this error:

  kernel: ib_mthca 0000:0c:00.0: HW2SW_MPT failed (-16)

 
Version-Release number of selected component (if applicable):
[root@dell-pe1950-02 ~]# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-37.el5rt #1 SMP PREEMPT RT
Thu Aug 30 16:05:41 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[root@dell-pe1950-02 ~]# modinfo ib_mthca
filename:      
/lib/modules/2.6.21-37.el5rt/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko
version:        0.08
license:        Dual BSD/GPL
description:    Mellanox InfiniBand HCA low-level driver
author:         Roland Dreier
srcversion:     DBCBEAE0F96BE105E037FE4
alias:          pci:v00001867d00005E8Csv*sd*bc*sc*i*
alias:          pci:v000015B3d00005E8Csv*sd*bc*sc*i*
alias:          pci:v00001867d00006274sv*sd*bc*sc*i*
alias:          pci:v000015B3d00006274sv*sd*bc*sc*i*
alias:          pci:v00001867d00006282sv*sd*bc*sc*i*
alias:          pci:v000015B3d00006282sv*sd*bc*sc*i*
alias:          pci:v00001867d00006278sv*sd*bc*sc*i*
alias:          pci:v000015B3d00006278sv*sd*bc*sc*i*
alias:          pci:v00001867d00005A44sv*sd*bc*sc*i*
alias:          pci:v000015B3d00005A44sv*sd*bc*sc*i*
depends:        ib_mad,ib_core
vermagic:       2.6.21-37.el5rt SMP preempt mod_unload 
parm:           catas_reset_disable:disable reset on catastrophic event if
nonzero (int)
parm:           qos_support:Enable QoS support if > 0 (int)
parm:           fw_cmd_doorbell:post FW commands through doorbell page if
nonzero (and supported by FW) (int)
parm:           debug_level:Enable debug tracing if > 0 (int)
parm:           msi_x:attempt to use MSI-X if nonzero (int)
parm:           msi:attempt to use MSI if nonzero (int)
parm:           tune_pci:increase PCI burst from the default set by BIOS if
nonzero (int)
parm:           num_qp:maximum number of QPs per HCA (int)
parm:           rdb_per_qp:number of RDB buffers per QP (int)
parm:           num_cq:maximum number of CQs per HCA (int)
parm:           num_mcg:maximum number of multicast groups per HCA (int)
parm:           num_mpt:maximum number of memory protection table entries per
HCA (int)
parm:           num_mtt:maximum number of memory translation table segments per
HCA (int)
parm:           num_udav:maximum number of UD address vectors per HCA (int)
parm:           fmr_reserved_mtts:number of memory translation table segments
reserved for FMR (int)

How reproducible:
 Very.

Steps to Reproduce:
1. Run a network performance program to the ib0 interface.
2.
3.
  

Additional info:
  This only happened with MT25204 cards so far. MT25208 doesn't seem to have
this issue. I am yet to test with qlogic cards, will post the results about them
as well.

Comment 1 Gurhan Ozen 2007-10-08 14:35:30 UTC
Marking this as duplicate of BZ #251934 . If there are separate release notes
for RT kernel, then the release on #251934 should be included for RT as well. 

*** This bug has been marked as a duplicate of 251934 ***


Note You need to log in before you can comment on or make changes to this bug.