Description of problem: When trying to test IPoIB with iperf program, one of the HCAs has become unusable and, though it wasn't verbose, ib_mthca crashed. After running iperf program, running ibstat hanged and gave this error: ibstat: ibpanic: [4235] main: stat of IB device 'mthca0' failed: (Device or resource busy) Trying to reload ib_mthca to remedy problem, rmmod ib_mthca hung as well and gave this error: kernel: ib_mthca 0000:0c:00.0: HW2SW_MPT failed (-16) Version-Release number of selected component (if applicable): [root@dell-pe1950-02 ~]# uname -a Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-37.el5rt #1 SMP PREEMPT RT Thu Aug 30 16:05:41 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux [root@dell-pe1950-02 ~]# modinfo ib_mthca filename: /lib/modules/2.6.21-37.el5rt/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko version: 0.08 license: Dual BSD/GPL description: Mellanox InfiniBand HCA low-level driver author: Roland Dreier srcversion: DBCBEAE0F96BE105E037FE4 alias: pci:v00001867d00005E8Csv*sd*bc*sc*i* alias: pci:v000015B3d00005E8Csv*sd*bc*sc*i* alias: pci:v00001867d00006274sv*sd*bc*sc*i* alias: pci:v000015B3d00006274sv*sd*bc*sc*i* alias: pci:v00001867d00006282sv*sd*bc*sc*i* alias: pci:v000015B3d00006282sv*sd*bc*sc*i* alias: pci:v00001867d00006278sv*sd*bc*sc*i* alias: pci:v000015B3d00006278sv*sd*bc*sc*i* alias: pci:v00001867d00005A44sv*sd*bc*sc*i* alias: pci:v000015B3d00005A44sv*sd*bc*sc*i* depends: ib_mad,ib_core vermagic: 2.6.21-37.el5rt SMP preempt mod_unload parm: catas_reset_disable:disable reset on catastrophic event if nonzero (int) parm: qos_support:Enable QoS support if > 0 (int) parm: fw_cmd_doorbell:post FW commands through doorbell page if nonzero (and supported by FW) (int) parm: debug_level:Enable debug tracing if > 0 (int) parm: msi_x:attempt to use MSI-X if nonzero (int) parm: msi:attempt to use MSI if nonzero (int) parm: tune_pci:increase PCI burst from the default set by BIOS if nonzero (int) parm: num_qp:maximum number of QPs per HCA (int) parm: rdb_per_qp:number of RDB buffers per QP (int) parm: num_cq:maximum number of CQs per HCA (int) parm: num_mcg:maximum number of multicast groups per HCA (int) parm: num_mpt:maximum number of memory protection table entries per HCA (int) parm: num_mtt:maximum number of memory translation table segments per HCA (int) parm: num_udav:maximum number of UD address vectors per HCA (int) parm: fmr_reserved_mtts:number of memory translation table segments reserved for FMR (int) How reproducible: Very. Steps to Reproduce: 1. Run a network performance program to the ib0 interface. 2. 3. Additional info: This only happened with MT25204 cards so far. MT25208 doesn't seem to have this issue. I am yet to test with qlogic cards, will post the results about them as well.
Marking this as duplicate of BZ #251934 . If there are separate release notes for RT kernel, then the release on #251934 should be included for RT as well. *** This bug has been marked as a duplicate of 251934 ***