Description of problem: Booting kernel, kernel fails to recognize Mellanox 10Gb RDMA device. Which in turn fails to run MRG-M with RDMA driver loaded Version-Release number of selected component (if applicable): 2.6.33.9-rt31.74.el6rt.x86_64 How reproducible: Consistent with this kernel Every time Steps to Reproduce: 1.Boot RT kernel 2. 3. Actual results: [root@perf44 ~]# ibv_devinfo No IB devices found Expected results: From booting RHEL62 189 kernel [root@perf42 ~]# ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.7.700 node_guid: 0002:c903:0007:99ac sys_image_guid: 0002:c903:0007:99af vendor_id: 0x02c9 vendor_part_id: 26448 hw_ver: 0xB0 board_id: MT_0DB0120010 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet Additional info: From /var/log/messages Aug 18 09:51:52 perf44 kernel: Modules linked in: ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 dm_mirror dm_region_hash dm_log mlx4_en mlx4_core sg microcode serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt bnx2 i7core_edac edac_core power_meter shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: ib_core] Aug 18 09:51:52 perf44 kernel: [<ffffffffa02cf8ca>] ? mlx4_en_process_rx_cq+0x3ca/0x830 [mlx4_en] Aug 18 09:51:52 perf44 kernel: [<ffffffffa02cfd6f>] ? mlx4_en_poll_rx_cq+0x3f/0x80 [mlx4_en] Aug 18 09:51:52 perf44 kernel: [<ffffffffa02b43b2>] ? mlx4_cq_completion+0x42/0x80 [mlx4_core] Aug 18 09:51:59 perf44 kernel: mlx4_en: eth0: Link Down Aug 18 09:51:59 perf44 kernel: mlx4_en: eth1: Link Down Aug 18 09:56:41 perf44 kernel: mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007) Aug 18 09:56:41 perf44 kernel: mlx4_core: Initializing 0000:08:00.0 Aug 18 09:56:41 perf44 kernel: mlx4_core 0000:08:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32 Aug 18 09:56:41 perf44 kernel: mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.4.1.1 (June 2009) Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Using 8 tx rings for port:1 Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Defaulting to 16 rx rings for port:1 Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Using 8 tx rings for port:2 Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Defaulting to 16 rx rings for port:2 Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Activating port:1 Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 1: Using 8 TX rings Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 1: Using 16 RX rings Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Activating port:2 Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 2: Using 8 TX rings Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 2: Using 16 RX rings Aug 18 09:56:41 perf44 kernel: mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
We have an updated PCI device table for the mlx4 driver in the rhel6.2 kernel, and a 2.6.33 rt kernel is quite old compared to our current rhel6.2 infiniband driver stack which is now at upstream kernel 3.0 level. I would suggest updating the rt kernel to a later mlx4 driver (which might require additional infiniband stack updates or might not).
This issue has not been updated in a while and is against an older, unsupported kernel. This BZ is being closed WONTFIX. If you believe this is still an issue on our most recent MRG-2.5 3.10 Realtime kernel, please file a new issue for further investigation.