Description of problem: Installed 6.9_MRG, then failed to create device for nic_driver mlx5_core Version-Release number of selected component (if applicable): 3.10.0-514.rt56.210.el6rt.x86_64 How reproducible: 3/3 Steps to Reproduce: 1.Install 6.9 MRG 2.lsmod | grep mlx5 --checked the mlx5_core has been installed 3.ip link show $nic ethtool -i $nic --found that no device for mlx5_core Actual results: failed Expected results: succeed to create the device Additional info: ####details info with MRG 514.rt56.210.el6rt: [root@cisco-c220m3-01 ~]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c0:67:af:98:03:5d brd ff:ff:ff:ff:ff:ff 3: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c0:67:af:98:03:5e brd ff:ff:ff:ff:ff:ff 4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether f8:72:ea:a4:01:78 brd ff:ff:ff:ff:ff:ff 5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether f8:72:ea:a4:01:79 brd ff:ff:ff:ff:ff:ff [root@cisco-c220m3-01 ~]# uname -a Linux cisco-c220m3-01.rhts.eng.pek2.redhat.com 3.10.0-514.rt56.210.el6rt.x86_64 #1 SMP PREEMPT RT Tue Dec 13 22:46:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux [root@cisco-c220m3-01 ~]# lsmod | grep mlx5 mlx5_ib 159074 0 ib_core 207935 11 mlx4_ib,ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,usnic_verbs,mlx5_ib mlx5_core 175590 1 mlx5_ib [root@cisco-c220m3-01 ~]# ethtool -i eth4 driver: enic version: 2.3.0.20 firmware-version: 2.1(2aS3) bus-info: 0000:0b:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no [root@cisco-c220m3-01 ~]# ethtool -i eth1 driver: igb version: 5.3.0-k firmware-version: 1.63, 0x80000aa4, 0.309.17 bus-info: 0000:04:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no [root@cisco-c220m3-01 core]# cat /proc/net/dev Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed eth0: 8991112 35568 0 0 0 0 0 3065 2798939 7138 0 0 0 0 0 0 eth1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 eth4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 eth5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 lo: 1740694 10186 0 0 0 0 0 0 1740694 10186 0 0 0 0 0 0 [root@cisco-c220m3-01 core]# modinfo mlx5_core filename: /lib/modules/3.10.0-514.rt56.210.el6rt.x86_64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko version: 3.0-1 license: Dual BSD/GPL description: Mellanox Connect-IB, ConnectX-4 core driver author: Eli Cohen <eli> rhelversion: 7.3 srcversion: 0D21B16CF9CD92A5142D03B alias: pci:v000015B3d00001018sv*sd*bc*sc*i* alias: pci:v000015B3d00001017sv*sd*bc*sc*i* alias: pci:v000015B3d00001016sv*sd*bc*sc*i* alias: pci:v000015B3d00001015sv*sd*bc*sc*i* alias: pci:v000015B3d00001014sv*sd*bc*sc*i* alias: pci:v000015B3d00001013sv*sd*bc*sc*i* alias: pci:v000015B3d00001012sv*sd*bc*sc*i* alias: pci:v000015B3d00001011sv*sd*bc*sc*i* depends: intree: Y vermagic: 3.10.0-514.rt56.210.el6rt.x86_64 SMP preempt mod_unload parm: debug_maskebug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (int) parm: prof_selrofile selector. Valid range 0 - 2 (int) ####checked that it works fine with RHEL7-rt, details: [root@cisco-c220m3-01 ~]# uname -a Linux cisco-c220m3-01.rhts.eng.pek2.redhat.com 3.10.0-514.rt56.420.el7.x86_64 #1 SMP PREEMPT RT Wed Oct 19 15:51:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux [root@cisco-c220m3-01 ~]# lsmod | grep mlx5 mlx5_ib 157087 0 ib_core 210859 15 rdma_cm,ib_cm,iw_cm,rpcrdma,mlx5_ib,ib_srp,ib_ucm,usnic_verbs,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert mlx5_core 279942 1 mlx5_ib ptp 19267 2 igb,mlx5_core [root@cisco-c220m3-01 ~]# ethtool -i enp130s0f0 driver: mlx5_core version: 3.0-1 (January 2015) firmware-version: 14.17.2020 expansion-rom-version: bus-info: 0000:82:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no
so from what I can see, the mlx5 modules are being loaded, but device creation is not happening. Do you see any failure messages in the boot log?
Created attachment 1320457 [details] boot log with kernel 3.10.0-693.2.1.rt56.585.el6rt.x86_64
Hi Beth, My apologies for the late. I missed this need_info before.... And I have tried with the new kernel,unfortunately,still hit the same issue. I also attached the boot log, please see attachment 1320457 [details], seems that there is not any failure messages. please help check, thanks. [root@hp-dl388g8-19 ~]# uname -a Linux hp-dl388g8-19.rhts.eng.pek2.redhat.com 3.10.0-693.2.1.rt56.585.el6rt.x86_64 #1 SMP PREEMPT RT Tue Aug 15 14:37:49 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root@hp-dl388g8-19 ~]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:07:43:14:8d:50 brd ff:ff:ff:ff:ff:ff 3: eth7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:07:43:14:8d:58 brd ff:ff:ff:ff:ff:ff 4: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 8c:7c:ff:2e:14:00 brd ff:ff:ff:ff:ff:ff 5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 8c:7c:ff:2e:14:01 brd ff:ff:ff:ff:ff:ff 6: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 2c:44:fd:7f:9f:ac brd ff:ff:ff:ff:ff:ff 7: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:ad brd ff:ff:ff:ff:ff:ff 8: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:ae brd ff:ff:ff:ff:ff:ff 9: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:af brd ff:ff:ff:ff:ff:ff [root@hp-dl388g8-19 ~]# grep mlx5 loginfo_693.log mlx5_core 0000:21:00.1: Shutdown was called mlx5_core 0000:21:00.0: Shutdown was called mlx5_core 0000:21:00.0: firmware version: 14.18.1000 mlx5_core 0000:21:00.0: Port module event: module 0, Cable plugged mlx5_core 0000:21:00.1: firmware version: 14.18.1000 mlx5_core 0000:21:00.1: Port module event: module 1, Cable plugged mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014) [root@hp-dl388g8-19 ~]# modinfo mlx5_core filename: /lib/modules/3.10.0-693.2.1.rt56.585.el6rt.x86_64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko version: 3.0-1 license: Dual BSD/GPL description: Mellanox Connect-IB, ConnectX-4 core driver author: Eli Cohen <eli> rhelversion: 7.4 srcversion: 0C8A83E32073E3E0DBB4223 alias: pci:v000015B3d0000101Asv*sd*bc*sc*i* alias: pci:v000015B3d00001019sv*sd*bc*sc*i* alias: pci:v000015B3d00001018sv*sd*bc*sc*i* alias: pci:v000015B3d00001017sv*sd*bc*sc*i* alias: pci:v000015B3d00001016sv*sd*bc*sc*i* alias: pci:v000015B3d00001015sv*sd*bc*sc*i* alias: pci:v000015B3d00001014sv*sd*bc*sc*i* alias: pci:v000015B3d00001013sv*sd*bc*sc*i* alias: pci:v000015B3d00001012sv*sd*bc*sc*i* alias: pci:v000015B3d00001011sv*sd*bc*sc*i* depends: intree: Y vermagic: 3.10.0-693.2.1.rt56.585.el6rt.x86_64 SMP preempt mod_unload parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint) parm: prof_sel:profile selector. Valid range 0 - 2 (uint) [root@hp-dl388g8-19 ~]# test(){ for i in `seq 1 7`; do ethtool -i eth$i | grep driver & done; } [root@hp-dl388g8-19 ~]# test [root@hp-dl388g8-19 ~]# driver: bna driver: tg3 driver: tg3 driver: tg3 driver: tg3 driver: cxgb4 driver: cxgb4
Hi Yuying, Thank you for the additional information. We were discussing this yesterday in our engineering call. Can you please tell me what rt-firmware package you have installed? Our latest is rt-firmware-2.4-1.el6rt I believe. Thanks for the help! Beth
(In reply to Beth Uptagrafft from comment #5) > Hi Yuying, > Thank you for the additional information. We were discussing this yesterday > in our engineering call. Can you please tell me what rt-firmware package you > have installed? Our latest is rt-firmware-2.4-1.el6rt I believe. > > Thanks for the help! > Beth Hi Beth, I checked form the testing log, and found that the rt-firmware is rt-firmware-2.4-1.el6rt.x86_64.Thanks. some log info: Installing : rt-firmware-2.4-1.el6rt.x86_64 Verifying : rt-firmware-2.4-1.el6rt.x86_64 Thanks, Yuying.
Created attachment 1360021 [details] sosreport: RHEL-RT-7 on hp-dl388g8-19.rhts.eng.pek2.redhat.com SOS report containing all the info of the host with RHEL-7-RT installed. It shows all NICs.
Hello there! Good news from the Pizza Planet! I made the nic to work as expected: -------------- %< -------------------- [root@hp-dl388g8-19 ~]# ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:07:43:14:8d:50 brd ff:ff:ff:ff:ff:ff 3: eth7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:07:43:14:8d:58 brd ff:ff:ff:ff:ff:ff 4: eth8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether e4:1d:2d:c0:85:a2 brd ff:ff:ff:ff:ff:ff 5: eth9: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether e4:1d:2d:c0:85:a3 brd ff:ff:ff:ff:ff:ff 6: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 8c:7c:ff:2e:14:00 brd ff:ff:ff:ff:ff:ff 7: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 8c:7c:ff:2e:14:01 brd ff:ff:ff:ff:ff:ff 8: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 2c:44:fd:7f:9f:ac brd ff:ff:ff:ff:ff:ff 9: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:ad brd ff:ff:ff:ff:ff:ff 10: eth4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:ae brd ff:ff:ff:ff:ff:ff 11: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 2c:44:fd:7f:9f:af brd ff:ff:ff:ff:ff:ff [root@hp-dl388g8-19 ~]# for i in `seq 1 9`; do ethtool -i eth$i | grep driver ; done driver: bna driver: tg3 driver: tg3 driver: tg3 driver: tg3 driver: cxgb4 driver: cxgb4 driver: mlx5_core driver: mlx5_core -------------- >% -------------- It turns out that the problem was miss kernel configuration. I synced the MLX config of the MRG-RT with the RHEL-RT, and then things started to work. These are the config changes required to make it to work: --------------- %< -------------- --- /boot/config-3.10.0-693.5.2.rt56.592.el6rt.x86_64 2017-10-13 18:50:07.000000000 -0400 +++ .config 2017-11-28 19:15:51.506616306 -0500 @@ -1,6 +1,6 @@ # # Automatically generated file; DO NOT EDIT. -# Linux/x86_64 3.10.0-693.5.2.rt56.592.el6rt.x86_64 Kernel Configuration +# Linux/x86 3.10.0 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y @@ -1245,7 +1245,7 @@ # CONFIG_NETLINK_MMAP is not set # CONFIG_NETLINK_DIAG is not set CONFIG_NET_MPLS_GSO=m -# CONFIG_NET_SWITCHDEV is not set +CONFIG_NET_SWITCHDEV=y CONFIG_RPS=y CONFIG_RFS_ACCEL=y CONFIG_XPS=y @@ -1344,8 +1344,8 @@ # CONFIG_NFC is not set # CONFIG_LWTUNNEL is not set CONFIG_DST_CACHE=y -# CONFIG_NET_DEVLINK is not set -CONFIG_MAY_USE_DEVLINK=y +CONFIG_NET_DEVLINK=m +CONFIG_MAY_USE_DEVLINK=m CONFIG_HAVE_BPF_JIT=y # @@ -2096,8 +2096,18 @@ CONFIG_MLX4_CORE=m CONFIG_MLX4_DEBUG=y CONFIG_MLX5_CORE=m -# CONFIG_MLX5_CORE_EN is not set -# CONFIG_MLXSW_CORE is not set +CONFIG_MLX5_CORE_EN=y +CONFIG_MLX5_CORE_EN_DCB=y +CONFIG_MLXSW_CORE=m +CONFIG_MLXSW_CORE_HWMON=y +CONFIG_MLXSW_CORE_THERMAL=y +CONFIG_MLXSW_PCI=m +CONFIG_MLXSW_I2C=m +CONFIG_MLXSW_SWITCHIB=m +CONFIG_MLXSW_SWITCHX2=m +CONFIG_MLXSW_SPECTRUM=m +CONFIG_MLXSW_SPECTRUM_DCB=y +CONFIG_MLXSW_MINIMAL=m # CONFIG_NET_VENDOR_MICREL is not set CONFIG_NET_VENDOR_MYRI=y CONFIG_MYRI10GE=m @@ -4818,6 +4828,7 @@ # CONFIG_RBTREE_TEST is not set # CONFIG_INTERVAL_TREE_TEST is not set # CONFIG_TEST_RHASHTABLE is not set +# CONFIG_TEST_PARMAN is not set CONFIG_PROVIDE_OHCI1394_DMA_INIT=y CONFIG_FIREWIRE_OHCI_REMOTE_DMA=y # CONFIG_BUILD_DOCSRC is not set @@ -5145,5 +5156,6 @@ CONFIG_SG_POOL=y CONFIG_ARCH_HAS_PMEM_API=y CONFIG_ARCH_HAS_MMIO_FLUSH=y +CONFIG_PARMAN=m # CONFIG_RH_KABI_SIZE_ALIGN_CHECKS is not set CONFIG_RH_MRG_RT=y ------------ >% --------------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:0181