Bug 1422778
Summary: | [mlx5] Failed to create device for nic_driver mlx5_core | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Ma Yuying <yuma> | ||||||
Component: | realtime-kernel | Assignee: | Daniel Bristot de Oliveira <daolivei> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Jiri Kastner <jkastner> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 2.5 | CC: | bhu, daolivei, jsvarova, lgoncalv, williams, yuma | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 3.10.0-693.15.1 | Doc Type: | Bug Fix | ||||||
Doc Text: |
The mlx5 driver has a number of configuration options, including the selective support for network protocols, such as InfiniBand and Ethernet. Due to a regression in the configuration of the MRG-RT kernel, the Ethernet mode of the driver was turned off. The regression has been resolved by enabling the mlx5 Ethernet mode, making the Ethernet protocol to work again.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-01-25 12:45:10 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Ma Yuying
2017-02-16 08:21:04 UTC
so from what I can see, the mlx5 modules are being loaded, but device creation is not happening. Do you see any failure messages in the boot log? Created attachment 1320457 [details]
boot log with kernel 3.10.0-693.2.1.rt56.585.el6rt.x86_64
Hi Beth,
My apologies for the late. I missed this need_info before....
And I have tried with the new kernel,unfortunately,still hit the
same issue.
I also attached the boot log, please see attachment 1320457 [details], seems that there is not any failure messages. please help check, thanks.
[root@hp-dl388g8-19 ~]# uname -a
Linux hp-dl388g8-19.rhts.eng.pek2.redhat.com 3.10.0-693.2.1.rt56.585.el6rt.x86_64 #1 SMP PREEMPT RT Tue Aug 15 14:37:49 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@hp-dl388g8-19 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:07:43:14:8d:50 brd ff:ff:ff:ff:ff:ff
3: eth7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 00:07:43:14:8d:58 brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 8c:7c:ff:2e:14:00 brd ff:ff:ff:ff:ff:ff
5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 8c:7c:ff:2e:14:01 brd ff:ff:ff:ff:ff:ff
6: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 2c:44:fd:7f:9f:ac brd ff:ff:ff:ff:ff:ff
7: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:ad brd ff:ff:ff:ff:ff:ff
8: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:ae brd ff:ff:ff:ff:ff:ff
9: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 2c:44:fd:7f:9f:af brd ff:ff:ff:ff:ff:ff
[root@hp-dl388g8-19 ~]# grep mlx5 loginfo_693.log
mlx5_core 0000:21:00.1: Shutdown was called
mlx5_core 0000:21:00.0: Shutdown was called
mlx5_core 0000:21:00.0: firmware version: 14.18.1000
mlx5_core 0000:21:00.0: Port module event: module 0, Cable plugged
mlx5_core 0000:21:00.1: firmware version: 14.18.1000
mlx5_core 0000:21:00.1: Port module event: module 1, Cable plugged
mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
[root@hp-dl388g8-19 ~]# modinfo mlx5_core
filename: /lib/modules/3.10.0-693.2.1.rt56.585.el6rt.x86_64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
version: 3.0-1
license: Dual BSD/GPL
description: Mellanox Connect-IB, ConnectX-4 core driver
author: Eli Cohen <eli>
rhelversion: 7.4
srcversion: 0C8A83E32073E3E0DBB4223
alias: pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias: pci:v000015B3d00001019sv*sd*bc*sc*i*
alias: pci:v000015B3d00001018sv*sd*bc*sc*i*
alias: pci:v000015B3d00001017sv*sd*bc*sc*i*
alias: pci:v000015B3d00001016sv*sd*bc*sc*i*
alias: pci:v000015B3d00001015sv*sd*bc*sc*i*
alias: pci:v000015B3d00001014sv*sd*bc*sc*i*
alias: pci:v000015B3d00001013sv*sd*bc*sc*i*
alias: pci:v000015B3d00001012sv*sd*bc*sc*i*
alias: pci:v000015B3d00001011sv*sd*bc*sc*i*
depends:
intree: Y
vermagic: 3.10.0-693.2.1.rt56.585.el6rt.x86_64 SMP preempt mod_unload
parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm: prof_sel:profile selector. Valid range 0 - 2 (uint)
[root@hp-dl388g8-19 ~]# test(){ for i in `seq 1 7`; do ethtool -i eth$i | grep driver & done; }
[root@hp-dl388g8-19 ~]# test
[root@hp-dl388g8-19 ~]#
driver: bna
driver: tg3
driver: tg3
driver: tg3
driver: tg3
driver: cxgb4
driver: cxgb4
Hi Yuying, Thank you for the additional information. We were discussing this yesterday in our engineering call. Can you please tell me what rt-firmware package you have installed? Our latest is rt-firmware-2.4-1.el6rt I believe. Thanks for the help! Beth (In reply to Beth Uptagrafft from comment #5) > Hi Yuying, > Thank you for the additional information. We were discussing this yesterday > in our engineering call. Can you please tell me what rt-firmware package you > have installed? Our latest is rt-firmware-2.4-1.el6rt I believe. > > Thanks for the help! > Beth Hi Beth, I checked form the testing log, and found that the rt-firmware is rt-firmware-2.4-1.el6rt.x86_64.Thanks. some log info: Installing : rt-firmware-2.4-1.el6rt.x86_64 Verifying : rt-firmware-2.4-1.el6rt.x86_64 Thanks, Yuying. Created attachment 1360021 [details]
sosreport: RHEL-RT-7 on hp-dl388g8-19.rhts.eng.pek2.redhat.com
SOS report containing all the info of the host with RHEL-7-RT installed.
It shows all NICs.
Hello there! Good news from the Pizza Planet! I made the nic to work as expected: -------------- %< -------------------- [root@hp-dl388g8-19 ~]# ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:07:43:14:8d:50 brd ff:ff:ff:ff:ff:ff 3: eth7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:07:43:14:8d:58 brd ff:ff:ff:ff:ff:ff 4: eth8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether e4:1d:2d:c0:85:a2 brd ff:ff:ff:ff:ff:ff 5: eth9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether e4:1d:2d:c0:85:a3 brd ff:ff:ff:ff:ff:ff 6: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 8c:7c:ff:2e:14:00 brd ff:ff:ff:ff:ff:ff 7: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 8c:7c:ff:2e:14:01 brd ff:ff:ff:ff:ff:ff 8: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 2c:44:fd:7f:9f:ac brd ff:ff:ff:ff:ff:ff 9: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:ad brd ff:ff:ff:ff:ff:ff 10: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:ae brd ff:ff:ff:ff:ff:ff 11: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:af brd ff:ff:ff:ff:ff:ff [root@hp-dl388g8-19 ~]# for i in `seq 1 9`; do ethtool -i eth$i | grep driver ; done driver: bna driver: tg3 driver: tg3 driver: tg3 driver: tg3 driver: cxgb4 driver: cxgb4 driver: mlx5_core driver: mlx5_core -------------- >% -------------- It turns out that the problem was miss kernel configuration. I synced the MLX config of the MRG-RT with the RHEL-RT, and then things started to work. These are the config changes required to make it to work: --------------- %< -------------- --- /boot/config-3.10.0-693.5.2.rt56.592.el6rt.x86_64 2017-10-13 18:50:07.000000000 -0400 +++ .config 2017-11-28 19:15:51.506616306 -0500 @@ -1,6 +1,6 @@ # # Automatically generated file; DO NOT EDIT. -# Linux/x86_64 3.10.0-693.5.2.rt56.592.el6rt.x86_64 Kernel Configuration +# Linux/x86 3.10.0 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y @@ -1245,7 +1245,7 @@ # CONFIG_NETLINK_MMAP is not set # CONFIG_NETLINK_DIAG is not set CONFIG_NET_MPLS_GSO=m -# CONFIG_NET_SWITCHDEV is not set +CONFIG_NET_SWITCHDEV=y CONFIG_RPS=y CONFIG_RFS_ACCEL=y CONFIG_XPS=y @@ -1344,8 +1344,8 @@ # CONFIG_NFC is not set # CONFIG_LWTUNNEL is not set CONFIG_DST_CACHE=y -# CONFIG_NET_DEVLINK is not set -CONFIG_MAY_USE_DEVLINK=y +CONFIG_NET_DEVLINK=m +CONFIG_MAY_USE_DEVLINK=m CONFIG_HAVE_BPF_JIT=y # @@ -2096,8 +2096,18 @@ CONFIG_MLX4_CORE=m CONFIG_MLX4_DEBUG=y CONFIG_MLX5_CORE=m -# CONFIG_MLX5_CORE_EN is not set -# CONFIG_MLXSW_CORE is not set +CONFIG_MLX5_CORE_EN=y +CONFIG_MLX5_CORE_EN_DCB=y +CONFIG_MLXSW_CORE=m +CONFIG_MLXSW_CORE_HWMON=y +CONFIG_MLXSW_CORE_THERMAL=y +CONFIG_MLXSW_PCI=m +CONFIG_MLXSW_I2C=m +CONFIG_MLXSW_SWITCHIB=m +CONFIG_MLXSW_SWITCHX2=m +CONFIG_MLXSW_SPECTRUM=m +CONFIG_MLXSW_SPECTRUM_DCB=y +CONFIG_MLXSW_MINIMAL=m # CONFIG_NET_VENDOR_MICREL is not set CONFIG_NET_VENDOR_MYRI=y CONFIG_MYRI10GE=m @@ -4818,6 +4828,7 @@ # CONFIG_RBTREE_TEST is not set # CONFIG_INTERVAL_TREE_TEST is not set # CONFIG_TEST_RHASHTABLE is not set +# CONFIG_TEST_PARMAN is not set CONFIG_PROVIDE_OHCI1394_DMA_INIT=y CONFIG_FIREWIRE_OHCI_REMOTE_DMA=y # CONFIG_BUILD_DOCSRC is not set @@ -5145,5 +5156,6 @@ CONFIG_SG_POOL=y CONFIG_ARCH_HAS_PMEM_API=y CONFIG_ARCH_HAS_MMIO_FLUSH=y +CONFIG_PARMAN=m # CONFIG_RH_KABI_SIZE_ALIGN_CHECKS is not set CONFIG_RH_MRG_RT=y ------------ >% -------------- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0181 |