Bug 1422778
| Summary: | [mlx5] Failed to create device for nic_driver mlx5_core | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Ma Yuying <yuma> | ||||||
| Component: | realtime-kernel | Assignee: | Daniel Bristot de Oliveira <daolivei> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Jiri Kastner <jkastner> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 2.5 | CC: | bhu, daolivei, jsvarova, lgoncalv, williams, yuma | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | 3.10.0-693.15.1 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
The mlx5 driver has a number of configuration options, including the selective support for network protocols, such as InfiniBand and Ethernet. Due to a regression in the configuration of the MRG-RT kernel, the Ethernet mode of the driver was turned off. The regression has been resolved by enabling the mlx5 Ethernet mode, making the Ethernet protocol to work again.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-01-25 12:45:10 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
so from what I can see, the mlx5 modules are being loaded, but device creation is not happening. Do you see any failure messages in the boot log? Created attachment 1320457 [details]
boot log with kernel 3.10.0-693.2.1.rt56.585.el6rt.x86_64
Hi Beth,
My apologies for the late. I missed this need_info before....
And I have tried with the new kernel,unfortunately,still hit the
same issue.
I also attached the boot log, please see attachment 1320457 [details], seems that there is not any failure messages. please help check, thanks.
[root@hp-dl388g8-19 ~]# uname -a
Linux hp-dl388g8-19.rhts.eng.pek2.redhat.com 3.10.0-693.2.1.rt56.585.el6rt.x86_64 #1 SMP PREEMPT RT Tue Aug 15 14:37:49 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl388g8-19 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:07:43:14:8d:50 brd ff:ff:ff:ff:ff:ff
3: eth7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:07:43:14:8d:58 brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 8c:7c:ff:2e:14:00 brd ff:ff:ff:ff:ff:ff
5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 8c:7c:ff:2e:14:01 brd ff:ff:ff:ff:ff:ff
6: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 2c:44:fd:7f:9f:ac brd ff:ff:ff:ff:ff:ff
7: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:ad brd ff:ff:ff:ff:ff:ff
8: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:ae brd ff:ff:ff:ff:ff:ff
9: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:af brd ff:ff:ff:ff:ff:ff
[root@hp-dl388g8-19 ~]# grep mlx5 loginfo_693.log
mlx5_core 0000:21:00.1: Shutdown was called
mlx5_core 0000:21:00.0: Shutdown was called
mlx5_core 0000:21:00.0: firmware version: 14.18.1000
mlx5_core 0000:21:00.0: Port module event: module 0, Cable plugged
mlx5_core 0000:21:00.1: firmware version: 14.18.1000
mlx5_core 0000:21:00.1: Port module event: module 1, Cable plugged
mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
[root@hp-dl388g8-19 ~]# modinfo mlx5_core
filename: /lib/modules/3.10.0-693.2.1.rt56.585.el6rt.x86_64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
version: 3.0-1
license: Dual BSD/GPL
description: Mellanox Connect-IB, ConnectX-4 core driver
author: Eli Cohen <eli>
rhelversion: 7.4
srcversion: 0C8A83E32073E3E0DBB4223
alias: pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias: pci:v000015B3d00001019sv*sd*bc*sc*i*
alias: pci:v000015B3d00001018sv*sd*bc*sc*i*
alias: pci:v000015B3d00001017sv*sd*bc*sc*i*
alias: pci:v000015B3d00001016sv*sd*bc*sc*i*
alias: pci:v000015B3d00001015sv*sd*bc*sc*i*
alias: pci:v000015B3d00001014sv*sd*bc*sc*i*
alias: pci:v000015B3d00001013sv*sd*bc*sc*i*
alias: pci:v000015B3d00001012sv*sd*bc*sc*i*
alias: pci:v000015B3d00001011sv*sd*bc*sc*i*
depends:
intree: Y
vermagic: 3.10.0-693.2.1.rt56.585.el6rt.x86_64 SMP preempt mod_unload
parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm: prof_sel:profile selector. Valid range 0 - 2 (uint)
[root@hp-dl388g8-19 ~]# test(){ for i in `seq 1 7`; do ethtool -i eth$i | grep driver & done; }
[root@hp-dl388g8-19 ~]# test
[root@hp-dl388g8-19 ~]#
driver: bna
driver: tg3
driver: tg3
driver: tg3
driver: tg3
driver: cxgb4
driver: cxgb4
Hi Yuying, Thank you for the additional information. We were discussing this yesterday in our engineering call. Can you please tell me what rt-firmware package you have installed? Our latest is rt-firmware-2.4-1.el6rt I believe. Thanks for the help! Beth (In reply to Beth Uptagrafft from comment #5) > Hi Yuying, > Thank you for the additional information. We were discussing this yesterday > in our engineering call. Can you please tell me what rt-firmware package you > have installed? Our latest is rt-firmware-2.4-1.el6rt I believe. > > Thanks for the help! > Beth Hi Beth, I checked form the testing log, and found that the rt-firmware is rt-firmware-2.4-1.el6rt.x86_64.Thanks. some log info: Installing : rt-firmware-2.4-1.el6rt.x86_64 Verifying : rt-firmware-2.4-1.el6rt.x86_64 Thanks, Yuying. Created attachment 1360021 [details]
sosreport: RHEL-RT-7 on hp-dl388g8-19.rhts.eng.pek2.redhat.com
SOS report containing all the info of the host with RHEL-7-RT installed.
It shows all NICs.
Hello there!
Good news from the Pizza Planet! I made the nic to work as expected:
-------------- %< --------------------
[root@hp-dl388g8-19 ~]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:07:43:14:8d:50 brd ff:ff:ff:ff:ff:ff
3: eth7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:07:43:14:8d:58 brd ff:ff:ff:ff:ff:ff
4: eth8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether e4:1d:2d:c0:85:a2 brd ff:ff:ff:ff:ff:ff
5: eth9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether e4:1d:2d:c0:85:a3 brd ff:ff:ff:ff:ff:ff
6: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 8c:7c:ff:2e:14:00 brd ff:ff:ff:ff:ff:ff
7: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 8c:7c:ff:2e:14:01 brd ff:ff:ff:ff:ff:ff
8: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 2c:44:fd:7f:9f:ac brd ff:ff:ff:ff:ff:ff
9: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:ad brd ff:ff:ff:ff:ff:ff
10: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:ae brd ff:ff:ff:ff:ff:ff
11: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:af brd ff:ff:ff:ff:ff:ff
[root@hp-dl388g8-19 ~]# for i in `seq 1 9`; do ethtool -i eth$i | grep driver ; done
driver: bna
driver: tg3
driver: tg3
driver: tg3
driver: tg3
driver: cxgb4
driver: cxgb4
driver: mlx5_core
driver: mlx5_core
-------------- >% --------------
It turns out that the problem was miss kernel configuration.
I synced the MLX config of the MRG-RT with the RHEL-RT, and then things started to work.
These are the config changes required to make it to work:
--------------- %< --------------
--- /boot/config-3.10.0-693.5.2.rt56.592.el6rt.x86_64 2017-10-13 18:50:07.000000000 -0400
+++ .config 2017-11-28 19:15:51.506616306 -0500
@@ -1,6 +1,6 @@
#
# Automatically generated file; DO NOT EDIT.
-# Linux/x86_64 3.10.0-693.5.2.rt56.592.el6rt.x86_64 Kernel Configuration
+# Linux/x86 3.10.0 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
@@ -1245,7 +1245,7 @@
# CONFIG_NETLINK_MMAP is not set
# CONFIG_NETLINK_DIAG is not set
CONFIG_NET_MPLS_GSO=m
-# CONFIG_NET_SWITCHDEV is not set
+CONFIG_NET_SWITCHDEV=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
@@ -1344,8 +1344,8 @@
# CONFIG_NFC is not set
# CONFIG_LWTUNNEL is not set
CONFIG_DST_CACHE=y
-# CONFIG_NET_DEVLINK is not set
-CONFIG_MAY_USE_DEVLINK=y
+CONFIG_NET_DEVLINK=m
+CONFIG_MAY_USE_DEVLINK=m
CONFIG_HAVE_BPF_JIT=y
#
@@ -2096,8 +2096,18 @@
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
CONFIG_MLX5_CORE=m
-# CONFIG_MLX5_CORE_EN is not set
-# CONFIG_MLXSW_CORE is not set
+CONFIG_MLX5_CORE_EN=y
+CONFIG_MLX5_CORE_EN_DCB=y
+CONFIG_MLXSW_CORE=m
+CONFIG_MLXSW_CORE_HWMON=y
+CONFIG_MLXSW_CORE_THERMAL=y
+CONFIG_MLXSW_PCI=m
+CONFIG_MLXSW_I2C=m
+CONFIG_MLXSW_SWITCHIB=m
+CONFIG_MLXSW_SWITCHX2=m
+CONFIG_MLXSW_SPECTRUM=m
+CONFIG_MLXSW_SPECTRUM_DCB=y
+CONFIG_MLXSW_MINIMAL=m
# CONFIG_NET_VENDOR_MICREL is not set
CONFIG_NET_VENDOR_MYRI=y
CONFIG_MYRI10GE=m
@@ -4818,6 +4828,7 @@
# CONFIG_RBTREE_TEST is not set
# CONFIG_INTERVAL_TREE_TEST is not set
# CONFIG_TEST_RHASHTABLE is not set
+# CONFIG_TEST_PARMAN is not set
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
CONFIG_FIREWIRE_OHCI_REMOTE_DMA=y
# CONFIG_BUILD_DOCSRC is not set
@@ -5145,5 +5156,6 @@
CONFIG_SG_POOL=y
CONFIG_ARCH_HAS_PMEM_API=y
CONFIG_ARCH_HAS_MMIO_FLUSH=y
+CONFIG_PARMAN=m
# CONFIG_RH_KABI_SIZE_ALIGN_CHECKS is not set
CONFIG_RH_MRG_RT=y
------------ >% --------------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0181 |
Description of problem: Installed 6.9_MRG, then failed to create device for nic_driver mlx5_core Version-Release number of selected component (if applicable): 3.10.0-514.rt56.210.el6rt.x86_64 How reproducible: 3/3 Steps to Reproduce: 1.Install 6.9 MRG 2.lsmod | grep mlx5 --checked the mlx5_core has been installed 3.ip link show $nic ethtool -i $nic --found that no device for mlx5_core Actual results: failed Expected results: succeed to create the device Additional info: ####details info with MRG 514.rt56.210.el6rt: [root@cisco-c220m3-01 ~]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c0:67:af:98:03:5d brd ff:ff:ff:ff:ff:ff 3: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c0:67:af:98:03:5e brd ff:ff:ff:ff:ff:ff 4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether f8:72:ea:a4:01:78 brd ff:ff:ff:ff:ff:ff 5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether f8:72:ea:a4:01:79 brd ff:ff:ff:ff:ff:ff [root@cisco-c220m3-01 ~]# uname -a Linux cisco-c220m3-01.rhts.eng.pek2.redhat.com 3.10.0-514.rt56.210.el6rt.x86_64 #1 SMP PREEMPT RT Tue Dec 13 22:46:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux [root@cisco-c220m3-01 ~]# lsmod | grep mlx5 mlx5_ib 159074 0 ib_core 207935 11 mlx4_ib,ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,usnic_verbs,mlx5_ib mlx5_core 175590 1 mlx5_ib [root@cisco-c220m3-01 ~]# ethtool -i eth4 driver: enic version: 2.3.0.20 firmware-version: 2.1(2aS3) bus-info: 0000:0b:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no [root@cisco-c220m3-01 ~]# ethtool -i eth1 driver: igb version: 5.3.0-k firmware-version: 1.63, 0x80000aa4, 0.309.17 bus-info: 0000:04:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no [root@cisco-c220m3-01 core]# cat /proc/net/dev Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed eth0: 8991112 35568 0 0 0 0 0 3065 2798939 7138 0 0 0 0 0 0 eth1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 eth4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 eth5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 lo: 1740694 10186 0 0 0 0 0 0 1740694 10186 0 0 0 0 0 0 [root@cisco-c220m3-01 core]# modinfo mlx5_core filename: /lib/modules/3.10.0-514.rt56.210.el6rt.x86_64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko version: 3.0-1 license: Dual BSD/GPL description: Mellanox Connect-IB, ConnectX-4 core driver author: Eli Cohen <eli> rhelversion: 7.3 srcversion: 0D21B16CF9CD92A5142D03B alias: pci:v000015B3d00001018sv*sd*bc*sc*i* alias: pci:v000015B3d00001017sv*sd*bc*sc*i* alias: pci:v000015B3d00001016sv*sd*bc*sc*i* alias: pci:v000015B3d00001015sv*sd*bc*sc*i* alias: pci:v000015B3d00001014sv*sd*bc*sc*i* alias: pci:v000015B3d00001013sv*sd*bc*sc*i* alias: pci:v000015B3d00001012sv*sd*bc*sc*i* alias: pci:v000015B3d00001011sv*sd*bc*sc*i* depends: intree: Y vermagic: 3.10.0-514.rt56.210.el6rt.x86_64 SMP preempt mod_unload parm: debug_maskebug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (int) parm: prof_selrofile selector. Valid range 0 - 2 (int) ####checked that it works fine with RHEL7-rt, details: [root@cisco-c220m3-01 ~]# uname -a Linux cisco-c220m3-01.rhts.eng.pek2.redhat.com 3.10.0-514.rt56.420.el7.x86_64 #1 SMP PREEMPT RT Wed Oct 19 15:51:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux [root@cisco-c220m3-01 ~]# lsmod | grep mlx5 mlx5_ib 157087 0 ib_core 210859 15 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx5_ib,ib_srp,ib_ucm,usnic_verbs,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert mlx5_core 279942 1 mlx5_ib ptp 19267 2 igb,mlx5_core [root@cisco-c220m3-01 ~]# ethtool -i enp130s0f0 driver: mlx5_core version: 3.0-1 (January 2015) firmware-version: 14.17.2020 expansion-rom-version: bus-info: 0000:82:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no