Bug 731734 - Mellanox 10Gb ConnectX InfiniBand fails to be recognized as a RDMA device
Summary: Mellanox 10Gb ConnectX InfiniBand fails to be recognized as a RDMA device
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: Development
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Red Hat Real Time Maintenance
QA Contact: David Sommerseth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-18 14:03 UTC by Tom Tracy
Modified: 2016-05-22 23:33 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-25 19:34:47 UTC


Attachments (Terms of Use)

Description Tom Tracy 2011-08-18 14:03:59 UTC
Description of problem:

Booting kernel, kernel fails to recognize Mellanox 10Gb RDMA device. Which in turn fails to run MRG-M with RDMA driver loaded 

Version-Release number of selected component (if applicable):
2.6.33.9-rt31.74.el6rt.x86_64

How reproducible: Consistent with this kernel

Every time

Steps to Reproduce:
1.Boot RT kernel
2. 
3.
  
Actual results:
[root@perf44 ~]# ibv_devinfo
No IB devices found

Expected results:

From booting RHEL62 189 kernel

[root@perf42 ~]# ibv_devinfo
hca_id:	mlx4_0
	transport:			InfiniBand (0)
	fw_ver:				2.7.700
	node_guid:			0002:c903:0007:99ac
	sys_image_guid:			0002:c903:0007:99af
	vendor_id:			0x02c9
	vendor_part_id:			26448
	hw_ver:				0xB0
	board_id:			MT_0DB0120010
	phys_port_cnt:			2
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		2048 (4)
			active_mtu:		2048 (4)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

		port:	2
			state:			PORT_ACTIVE (4)
			max_mtu:		2048 (4)
			active_mtu:		2048 (4)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet


Additional info:
From /var/log/messages

Aug 18 09:51:52 perf44 kernel: Modules linked in: ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 dm_mirror dm_region_hash dm_log mlx4_en mlx4_core sg microcode serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt bnx2 i7core_edac edac_core power_meter shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: ib_core]
Aug 18 09:51:52 perf44 kernel: [<ffffffffa02cf8ca>] ? mlx4_en_process_rx_cq+0x3ca/0x830 [mlx4_en]
Aug 18 09:51:52 perf44 kernel: [<ffffffffa02cfd6f>] ? mlx4_en_poll_rx_cq+0x3f/0x80 [mlx4_en]
Aug 18 09:51:52 perf44 kernel: [<ffffffffa02b43b2>] ? mlx4_cq_completion+0x42/0x80 [mlx4_core]
Aug 18 09:51:59 perf44 kernel: mlx4_en: eth0: Link Down
Aug 18 09:51:59 perf44 kernel: mlx4_en: eth1: Link Down
Aug 18 09:56:41 perf44 kernel: mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
Aug 18 09:56:41 perf44 kernel: mlx4_core: Initializing 0000:08:00.0
Aug 18 09:56:41 perf44 kernel: mlx4_core 0000:08:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32
Aug 18 09:56:41 perf44 kernel: mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.4.1.1 (June 2009)
Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Using 8 tx rings for port:1
Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Defaulting to 16 rx rings for port:1
Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Using 8 tx rings for port:2
Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Defaulting to 16 rx rings for port:2
Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Activating port:1
Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 1: Using 8 TX rings
Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 1: Using 16 RX rings
Aug 18 09:56:41 perf44 kernel: mlx4_en 0000:08:00.0: Activating port:2
Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 2: Using 8 TX rings
Aug 18 09:56:41 perf44 kernel: mlx4_en: 0000:08:00.0: Port 2: Using 16 RX rings
Aug 18 09:56:41 perf44 kernel: mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)

Comment 1 Doug Ledford 2011-08-31 19:31:22 UTC
We have an updated PCI device table for the mlx4 driver in the rhel6.2 kernel, and a 2.6.33 rt kernel is quite old compared to our current rhel6.2 infiniband driver stack which is now at upstream kernel 3.0 level.  I would suggest updating the rt kernel to a later mlx4 driver (which might require additional infiniband stack updates or might not).

Comment 2 Beth Uptagrafft 2014-09-25 19:34:47 UTC
This issue has not been updated in a while and is against an older, unsupported kernel. This BZ is being closed WONTFIX.  If you believe this is still an issue on our most recent MRG-2.5 3.10 Realtime kernel, please file a new issue for further investigation.


Note You need to log in before you can comment on or make changes to this bug.