I am not able to run a dpdk testpmd application, this validation was working before. In summery this is the error I see when starting the testpmd application net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure EAL: Error: Invalid memory net_mlx5: probe of PCI device 0000:3b:00.6 aborted after encountering an error: Cannot allocate memory EAL: Requested device 0000:3b:00.6 cannot be used Full details: testpmd -l 8,10,48,50 -w 0000:3b:00.6 --iova-mode=va --socket-mem=1024 --socket-limit=1024 -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall EAL: Detected 80 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: Probing VFIO support... EAL: PCI device 0000:3b:00.6 on NUMA socket 0 EAL: probe driver: 15b3:1018 net_mlx5 net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure EAL: Error: Invalid memory net_mlx5: probe of PCI device 0000:3b:00.6 aborted after encountering an error: Cannot allocate memory EAL: Requested device 0000:3b:00.6 cannot be used testpmd: No probed ethernet devices Interactive-mode selected Set mac packet forwarding mode testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc sh-4.4# lspci -v -nn -mm -k -s 0000:3b:00.6 Slot: 3b:00.6 Class: Ethernet controller [0200] Vendor: Mellanox Technologies [15b3] Device: MT27800 Family [ConnectX-5 Virtual Function] [1018] SVendor: Mellanox Technologies [15b3] SDevice: Device [0091] Driver: mlx5_core Module: mlx5_core NUMANode: 0 sh-4.4# lspci -v -nn -mm -k -s 0000:3b:00.0 Slot: 3b:00.0 Class: Ethernet controller [0200] Vendor: Mellanox Technologies [15b3] Device: MT27800 Family [ConnectX-5] [1017] SVendor: Mellanox Technologies [15b3] SDevice: Device [0091] Driver: mlx5_core Module: mlx5_core NUMANode: 0 ethtool -i ens1f0 driver: mlx5_core version: 5.0-0 firmware-version: 16.26.6000 (DEL0000000015) expansion-rom-version: bus-info: 0000:3b:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes dpdk version: dpdk.x86_64 19.11-4.el8 @rhel-8-for-x86_64-appstream-rpms dpdk-devel.x86_64 19.11-4.el8 @rhel-8-for-x86_64-appstream-rpms dpdk-tools.x86_64 19.11-4.el8 @rhel-8-for-x86_64-appstream-rpms kernel 4.18.0-193.24.1.el8_2.dt1.x86_64 with debug flag testpmd -l 8,10,48,50 -w 0000:3b:00.5 --iova-mode=va --log-level="*:debug" -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 80 lcore(s) EAL: Detected 2 NUMA nodes EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_bnxt.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_e1000.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_enic.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_failsafe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_i40e.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ixgbe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx4.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx5.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_netvsc.so.20.0 EAL: Registered [vmbus] bus. EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_nfp.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_qede.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ring.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_tap.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vdev_netvsc.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vhost.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_virtio.so.20.0 EAL: Ask a virtual area of 0x5000 bytes EAL: Virtual area found at 0x100000000 (size = 0x5000) EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Module /sys/module/vfio_pci not found! error 2 (No such file or directory) EAL: VFIO PCI modules not loaded EAL: Selected IOVA mode 'VA' EAL: Probing VFIO support... EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) EAL: VFIO modules not loaded, skipping VFIO support... EAL: Ask a virtual area of 0x2e000 bytes EAL: Virtual area found at 0x100005000 (size = 0x2e000) EAL: Setting up physically contiguous memory... EAL: Setting maximum number of open files to 1048576 EAL: Detected memory type: socket_id:0 hugepage_sz:1073741824 EAL: Detected memory type: socket_id:1 hugepage_sz:1073741824 EAL: Creating 4 segment lists: n_segs:32 socket_id:0 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x100033000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x140000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x940000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x980000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x1180000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x11c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x19c0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x1a00000000 (size = 0x800000000) EAL: Creating 4 segment lists: n_segs:32 socket_id:1 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2200000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2240000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2a40000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2a80000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3280000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x32c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3ac0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x3b00000000 (size = 0x800000000) EAL: TSC frequency is ~2490000 KHz EAL: Master lcore 8 is ready (tid=7ff303020900;cpuset=[8]) EAL: lcore 48 is ready (tid=7ff2f9e00700;cpuset=[48]) EAL: lcore 50 is ready (tid=7ff2f95ff700;cpuset=[50]) EAL: lcore 10 is ready (tid=7ff2fa601700;cpuset=[10]) EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 1024MB EAL: PCI device 0000:3b:00.5 on NUMA socket 0 EAL: probe driver: 15b3:1018 net_mlx5 EAL: Mem event callback 'MLX5_MEM_EVENT_CB:(nil)' registered net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure Segmentation fault (core dumped)
Update this is unrelated to the SElinux issue we found on this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1846560
- If it was working before, what changed in your setup? Do you have new SElinux logs? - I don't know about the reason of the allocation failure for now, but the logs give us an indication something is wrong in the mlx5 cleanup routine: * first try net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure EAL: Error: Invalid memory net_mlx5: probe of PCI device 0000:3b:00.6 aborted after encountering an error: Cannot allocate memory Here, "Error: Invalid memory" means that the mlx5 driver tried to free an invalid pointer. - second try net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure Segmentation fault (core dumped) Here, I guess rte_free() did not have the chance to complain and just crashed. Do you have the coredump to confirm this?
This does seem like the other BZ. Did you try to run with the WA from comment https://bugzilla.redhat.com/show_bug.cgi?id=1846560#c43 ?
Hi David, I am sorry I just validate with our QE team this is an openshift cluster running on new baremetal servers. System Information Manufacturer: Dell Inc. Product Name: PowerEdge R740 And the coredump is about the other BZ I disable the selinux and there is no core dump any more but I can see there are no nic for testpmd testpmd: No probed ethernet devices I can give you access to the env if you want to debug it there. full output run with debug flag EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 80 lcore(s) EAL: Detected 2 NUMA nodes EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_bnxt.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_e1000.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_enic.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_failsafe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_i40e.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ixgbe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx4.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx5.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_netvsc.so.20.0 EAL: Registered [vmbus] bus. EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_nfp.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_qede.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ring.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_tap.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vdev_netvsc.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vhost.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_virtio.so.20.0 EAL: Ask a virtual area of 0x5000 bytes EAL: Virtual area found at 0x100000000 (size = 0x5000) EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Module /sys/module/vfio_pci not found! error 2 (No such file or directory) EAL: VFIO PCI modules not loaded EAL: Selected IOVA mode 'VA' EAL: Probing VFIO support... EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) EAL: VFIO modules not loaded, skipping VFIO support... EAL: Ask a virtual area of 0x2e000 bytes EAL: Virtual area found at 0x100005000 (size = 0x2e000) EAL: Setting up physically contiguous memory... EAL: Setting maximum number of open files to 1048576 EAL: Detected memory type: socket_id:0 hugepage_sz:1073741824 EAL: Detected memory type: socket_id:1 hugepage_sz:1073741824 EAL: Creating 4 segment lists: n_segs:32 socket_id:0 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x100033000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x140000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x940000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x980000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x1180000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x11c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x19c0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x1a00000000 (size = 0x800000000) EAL: Creating 4 segment lists: n_segs:32 socket_id:1 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2200000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2240000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2a40000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2a80000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3280000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x32c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3ac0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x3b00000000 (size = 0x800000000) EAL: TSC frequency is ~2490000 KHz EAL: Master lcore 8 is ready (tid=7fcb35aca900;cpuset=[8]) EAL: lcore 48 is ready (tid=7fcb2c8aa700;cpuset=[48]) EAL: lcore 50 is ready (tid=7fcb2c0a9700;cpuset=[50]) EAL: lcore 10 is ready (tid=7fcb2d0ab700;cpuset=[10]) EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 1024MB EAL: PCI device 0000:3b:00.5 on NUMA socket 0 EAL: probe driver: 15b3:1018 net_mlx5 EAL: Mem event callback 'MLX5_MEM_EVENT_CB:(nil)' registered net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure net_mlx5: probe of PCI device 0000:3b:00.5 aborted after encountering an error: Cannot allocate memory EAL: Requested device 0000:3b:00.5 cannot be used EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) testpmd: No probed ethernet devices Interactive-mode selected Set mac packet forwarding mode testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Done testpmd>
- The hugepage files can be removed, and so this issue is different from bz1846560. Inside the pod: sh-4.4# uname -a Linux dpdk-h4h9c 4.18.0-193.24.1.el8_2.dt1.x86_64 #1 SMP Thu Sep 24 14:57:05 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# rpm -q dpdk dpdk-19.11-4.el8.x86_64 sh-4.4# rpm -q rdma-core rdma-core-26.0-8.el8.x86_64 sh-4.4# cat /etc/redhat-release Red Hat Enterprise Linux release 8.2 (Ootpa) - We still have the issue with debug logs not working for dynamic dpdk plugins in this version. I used gdb to force it (safer than my previous black magic that rewrote the default value in the binary). sh-4.4# gdb testpmd # -l 8,10,48,50 -w 0000:3b:00.5 --iova-mode=va --log-level="*:debug" -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall [...] (gdb) b eal_plugins_init Function "eal_plugins_init" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (eal_plugins_init) pending. [...] (gdb) run -l 8,10,48,50 -w 0000:3b:00.5 --iova-mode=va --log-level="*:debug" -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall Starting program: /usr/bin/testpmd -l 8,10,48,50 -w 0000:3b:00.5 --iova-mode=va --log-level="*:debug" -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". warning: Loadable section ".note.gnu.property" outside of ELF segments EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 1 EAL: Detected lcore 2 as core 4 on socket 0 EAL: Detected lcore 3 as core 4 on socket 1 EAL: Detected lcore 4 as core 1 on socket 0 [...] Breakpoint 1, eal_plugins_init () at /usr/src/debug/dpdk-19.11-4.el8.x86_64/lib/librte_eal/common/eal_common_options.c:280 280 { Missing separate debuginfos, use: yum debuginfo-install numactl-libs-2.0.12-9.el8.x86_64 (gdb) finish Run till exit from #0 eal_plugins_init () at /usr/src/debug/dpdk-19.11-4.el8.x86_64/lib/librte_eal/common/eal_common_options.c:280 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_bnxt.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_e1000.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_enic.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_failsafe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_i40e.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ixgbe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx4.so.20.0 warning: Loadable section ".note.gnu.property" outside of ELF segments warning: Loadable section ".note.gnu.property" outside of ELF segments EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx5.so.20.0 warning: Loadable section ".note.gnu.property" outside of ELF segments EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_netvsc.so.20.0 EAL: Registered [vmbus] bus. EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_nfp.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_qede.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ring.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_tap.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vdev_netvsc.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vhost.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_virtio.so.20.0 0x00007ffff47bdf60 in rte_eal_init (argc=argc@entry=14, argv=argv@entry=0x7fffffffe958) at /usr/src/debug/dpdk-19.11-4.el8.x86_64/lib/librte_eal/linux/eal/eal.c:1007 1007 if (eal_plugins_init() < 0) { Value returned is $1 = 0 Missing separate debuginfos, use: yum debuginfo-install libibverbs-26.0-8.el8.x86_64 (gdb) set rte_logs.dynamic_types[mlx5_logtype].loglevel = 8 (gdb) c Continuing. And so, after this, we can see the following mlx5 driver logs: EAL: Heap on socket 0 was expanded by 1024MB EAL: PCI device 0000:3b:00.5 on NUMA socket 0 EAL: probe driver: 15b3:1018 net_mlx5 EAL: Mem event callback 'MLX5_MEM_EVENT_CB:(nil)' registered net_mlx5: checking device "mlx5_6" net_mlx5: checking device "mlx5_5" net_mlx5: PCI information matches for device "mlx5_5" net_mlx5: checking device "mlx5_4" net_mlx5: checking device "mlx5_3" net_mlx5: checking device "mlx5_2" net_mlx5: checking device "mlx5_1" net_mlx5: checking device "mlx5_0" net_mlx5: no E-Switch support detected net_mlx5: naming Ethernet device "0000:3b:00.5" net_mlx5: DevX is supported net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure net_mlx5: probe of PCI device 0000:3b:00.5 aborted after encountering an error: Cannot allocate memory EAL: Requested device 0000:3b:00.5 cannot be used EAL: Module /sys/module/vfio not found! error 2 (No such file or directory)
# From the pod sh-4.4# dmesg |grep -i mlx.*0000:3b:00.5 [ 184.025855] mlx5_core 0000:3b:00.5: enabling device (0000 -> 0002) [ 184.032240] mlx5_core 0000:3b:00.5: firmware version: 16.26.6000 [ 184.320704] mlx5_core 0000:3b:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [ 184.338062] mlx5_core 0000:3b:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 184.346323] mlx5_core 0000:3b:00.5: Assigned random MAC address b2:a7:bd:88:e3:dc [ 184.482922] mlx5_core 0000:3b:00.5 ens1f0v3: renamed from eth0 [ 189.683721] mlx5_core 0000:3b:00.5: enabling device (0000 -> 0002) [ 189.690085] mlx5_core 0000:3b:00.5: firmware version: 16.26.6000 [ 189.972358] mlx5_core 0000:3b:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [ 189.989391] mlx5_core 0000:3b:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 190.131982] mlx5_core 0000:3b:00.5 ens1f0v3: renamed from eth0 [ 913.855598] mlx5_core 0000:3b:00.5: enabling device (0000 -> 0002) [ 913.861974] mlx5_core 0000:3b:00.5: firmware version: 16.26.6000 [ 914.148914] mlx5_core 0000:3b:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [ 914.165601] mlx5_core 0000:3b:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 914.173871] mlx5_core 0000:3b:00.5: Assigned random MAC address 4a:12:b9:19:07:2d [ 914.309407] mlx5_core 0000:3b:00.5 ens1f0v3: renamed from eth0 [ 919.588496] mlx5_core 0000:3b:00.5: enabling device (0000 -> 0002) [ 919.596135] mlx5_core 0000:3b:00.5: firmware version: 16.26.6000 [ 919.877791] mlx5_core 0000:3b:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [ 919.895938] mlx5_core 0000:3b:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 920.051348] mlx5_core 0000:3b:00.5 ens1f0v3: renamed from eth0 [ 4060.967387] mlx5_core 0000:3b:00.5: enabling device (0000 -> 0002) [ 4060.973769] mlx5_core 0000:3b:00.5: firmware version: 16.26.6000 [ 4061.262376] mlx5_core 0000:3b:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [ 4061.279366] mlx5_core 0000:3b:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 4061.287642] mlx5_core 0000:3b:00.5: Assigned random MAC address 86:3a:c6:d5:2a:b9 [ 4061.421220] mlx5_core 0000:3b:00.5 ens1f0v3: renamed from eth0 [ 4066.700811] mlx5_core 0000:3b:00.5: enabling device (0000 -> 0002) [ 4066.708610] mlx5_core 0000:3b:00.5: firmware version: 16.26.6000 [ 4066.993835] mlx5_core 0000:3b:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [ 4067.011190] mlx5_core 0000:3b:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 4067.165708] mlx5_core 0000:3b:00.5 ens1f0v3: renamed from eth0 [ 4117.444330] mlx5_core 0000:3b:00.5 temp_58: renamed from ens1f0v3 [ 4117.465870] mlx5_core 0000:3b:00.5 net1: renamed from temp_58 [ 4117.563422] mlx5_core 0000:3b:00.5 net1: Link up sh-4.4# ethtool -i net1 driver: mlx5_core version: 5.0-0 firmware-version: 16.26.6000 (DEL0000000015) expansion-rom-version: bus-info: 0000:3b:00.5 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes # From the node sh-4.4# modinfo mlx5_core filename: /lib/modules/4.18.0-193.24.1.el8_2.dt1.x86_64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko.xz [...] sh-4.4# modinfo mlx5_ib filename: /lib/modules/4.18.0-193.24.1.el8_2.dt1.x86_64/kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko.xz [...] Alaa, can you confirm those versions and firmwares look ok?
Hi, David. Yes, this FW version worked for us in BZ 1846560. Back then the root-cause was the issue with hugepages, after adding W/A to clear the memory, all worked fine. Can you test with that W/A just to double-check it's not the same issue? What is the difference between the hosts and/or configurations in this BZ and that one?
I removed the hugepage files and started from scratch. There is no memory issue, no reason to test with this hack.
Hi Alaa, Any update on this issue I change it to Urgent as it's blocking us.
(In reply to Sebastian Scheinkman from comment #9) > Hi Alaa, > > Any update on this issue I change it to Urgent as it's blocking us. Hi, Sebastian. Can you summarize the difference here from the working setup in BZ 1846560? Different setup/card, different configuration, etc...? Also, can you get a dump of firmware configurations from both systems? To get it run: # yum install -y mstflint # mstconfig -e -d <ConnectX card pci BDF> q
Hi Alaa, yes this is a different cluster (I don't have access to the old one anymore) here you go: sh-4.4# mstconfig -e -d 0000:19:00.0 q Device #1: ---------- Device type: ConnectX4LX Name: 0R887V Description: MCX422A-ACAA ConnectX-4 Lx EN Dual Port SFP28; 25GbE for Dell rack NDC Device: 0000:19:00.0 Configurations: Default Current Next Boot MEMIC_BAR_SIZE 0 0 0 MEMIC_SIZE_LIMIT _256KB(1) _256KB(1) _256KB(1) FLEX_PARSER_PROFILE_ENABLE 0 0 0 FLEX_IPV4_OVER_VXLAN_PORT 0 0 0 ROCE_NEXT_PROTOCOL 254 254 254 NON_PREFETCHABLE_PF_BAR False(0) False(0) False(0) VF_VPD_ENABLE False(0) False(0) False(0) STRICT_VF_MSIX_NUM False(0) False(0) False(0) VF_NODNIC_ENABLE False(0) False(0) False(0) * NUM_OF_VFS 8 5 5 * SRIOV_EN False(0) True(1) True(1) PF_LOG_BAR_SIZE 5 5 5 VF_LOG_BAR_SIZE 0 0 0 NUM_PF_MSIX 63 63 63 NUM_VF_MSIX 11 11 11 INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0) AUTOMATIC(0) AUTOMATIC(0) PARTIAL_RESET_EN False(0) False(0) False(0) SW_RECOVERY_ON_ERRORS False(0) False(0) False(0) RESET_WITH_HOST_ON_ERRORS False(0) False(0) False(0) CQE_COMPRESSION BALANCED(0) BALANCED(0) BALANCED(0) IP_OVER_VXLAN_EN False(0) False(0) False(0) UCTX_EN True(1) True(1) True(1) PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0) PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0) PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0) LRO_LOG_TIMEOUT0 6 6 6 LRO_LOG_TIMEOUT1 7 7 7 LRO_LOG_TIMEOUT2 8 8 8 LRO_LOG_TIMEOUT3 13 13 13 LOG_DCR_HASH_TABLE_SIZE 14 14 14 DCR_LIFO_SIZE 16384 16384 16384 ROCE_CC_PRIO_MASK_P1 255 255 255 ROCE_CC_ALGORITHM_P1 ECN(0) ECN(0) ECN(0) ROCE_CC_PRIO_MASK_P2 255 255 255 ROCE_CC_ALGORITHM_P2 ECN(0) ECN(0) ECN(0) CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1) True(1) True(1) CLAMP_TGT_RATE_P1 False(0) False(0) False(0) RPG_TIME_RESET_P1 300 300 300 RPG_BYTE_RESET_P1 32767 32767 32767 RPG_THRESHOLD_P1 1 1 1 RPG_MAX_RATE_P1 0 0 0 RPG_AI_RATE_P1 5 5 5 RPG_HAI_RATE_P1 50 50 50 RPG_GD_P1 11 11 11 RPG_MIN_DEC_FAC_P1 50 50 50 RPG_MIN_RATE_P1 1 1 1 RATE_TO_SET_ON_FIRST_CNP_P1 0 0 0 DCE_TCP_G_P1 1019 1019 1019 DCE_TCP_RTT_P1 1 1 1 RATE_REDUCE_MONITOR_PERIOD_P1 4 4 4 INITIAL_ALPHA_VALUE_P1 1023 1023 1023 MIN_TIME_BETWEEN_CNPS_P1 0 0 0 CNP_802P_PRIO_P1 6 6 6 CNP_DSCP_P1 48 48 48 CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1) True(1) True(1) CLAMP_TGT_RATE_P2 False(0) False(0) False(0) RPG_TIME_RESET_P2 300 300 300 RPG_BYTE_RESET_P2 32767 32767 32767 RPG_THRESHOLD_P2 1 1 1 RPG_MAX_RATE_P2 0 0 0 RPG_AI_RATE_P2 5 5 5 RPG_HAI_RATE_P2 50 50 50 RPG_GD_P2 11 11 11 RPG_MIN_DEC_FAC_P2 50 50 50 RPG_MIN_RATE_P2 1 1 1 RATE_TO_SET_ON_FIRST_CNP_P2 0 0 0 DCE_TCP_G_P2 1019 1019 1019 DCE_TCP_RTT_P2 1 1 1 RATE_REDUCE_MONITOR_PERIOD_P2 4 4 4 INITIAL_ALPHA_VALUE_P2 1023 1023 1023 MIN_TIME_BETWEEN_CNPS_P2 0 0 0 CNP_802P_PRIO_P2 6 6 6 CNP_DSCP_P2 48 48 48 LLDP_NB_DCBX_P1 False(0) False(0) False(0) LLDP_NB_RX_MODE_P1 ALL(2) ALL(2) ALL(2) LLDP_NB_TX_MODE_P1 ALL(2) ALL(2) ALL(2) LLDP_NB_DCBX_P2 False(0) False(0) False(0) LLDP_NB_RX_MODE_P2 ALL(2) ALL(2) ALL(2) LLDP_NB_TX_MODE_P2 ALL(2) ALL(2) ALL(2) DCBX_IEEE_P1 True(1) True(1) True(1) DCBX_CEE_P1 True(1) True(1) True(1) DCBX_WILLING_P1 True(1) True(1) True(1) DCBX_IEEE_P2 True(1) True(1) True(1) DCBX_CEE_P2 True(1) True(1) True(1) DCBX_WILLING_P2 True(1) True(1) True(1) KEEP_ETH_LINK_UP_P1 True(1) True(1) True(1) KEEP_IB_LINK_UP_P1 False(0) False(0) False(0) KEEP_LINK_UP_ON_BOOT_P1 False(0) False(0) False(0) KEEP_LINK_UP_ON_STANDBY_P1 False(0) False(0) False(0) KEEP_ETH_LINK_UP_P2 True(1) True(1) True(1) KEEP_IB_LINK_UP_P2 False(0) False(0) False(0) KEEP_LINK_UP_ON_BOOT_P2 False(0) False(0) False(0) KEEP_LINK_UP_ON_STANDBY_P2 False(0) False(0) False(0) NUM_OF_VL_P1 _4_VLs(3) _4_VLs(3) _4_VLs(3) NUM_OF_TC_P1 _8_TCs(0) _8_TCs(0) _8_TCs(0) NUM_OF_PFC_P1 8 8 8 NUM_OF_VL_P2 _4_VLs(3) _4_VLs(3) _4_VLs(3) NUM_OF_TC_P2 _8_TCs(0) _8_TCs(0) _8_TCs(0) NUM_OF_PFC_P2 8 8 8 DUP_MAC_ACTION_P1 LAST_CFG(0) LAST_CFG(0) LAST_CFG(0) SRIOV_IB_ROUTING_MODE_P1 LID(1) LID(1) LID(1) IB_ROUTING_MODE_P1 LID(1) LID(1) LID(1) DUP_MAC_ACTION_P2 LAST_CFG(0) LAST_CFG(0) LAST_CFG(0) SRIOV_IB_ROUTING_MODE_P2 LID(1) LID(1) LID(1) IB_ROUTING_MODE_P2 LID(1) LID(1) LID(1) WOL_MAGIC_EN False(0) False(0) False(0) PCI_WR_ORDERING per_mkey(0) per_mkey(0) per_mkey(0) MULTI_PORT_VHCA_EN False(0) False(0) False(0) PORT_OWNER True(1) True(1) True(1) ALLOW_RD_COUNTERS True(1) True(1) True(1) RENEG_ON_CHANGE True(1) True(1) True(1) TRACER_ENABLE True(1) True(1) True(1) BOOT_UNDI_NETWORK_WAIT 0 0 0 UEFI_HII_EN True(1) True(1) True(1) BOOT_DBG_LOG False(0) False(0) False(0) UEFI_LOGS DISABLED(0) DISABLED(0) DISABLED(0) BOOT_VLAN 1 1 1 * LEGACY_BOOT_PROTOCOL PXE(1) PXE(1) NONE(0) BOOT_RETRY_CNT NONE(0) NONE(0) NONE(0) BOOT_LACP_DIS True(1) True(1) True(1) BOOT_VLAN_EN False(0) False(0) False(0) BOOT_PKEY 0 0 0 DYNAMIC_VF_MSIX_TABLE False(0) False(0) False(0) ADVANCED_PCI_SETTINGS False(0) False(0) False(0) SAFE_MODE_THRESHOLD 10 10 10 SAFE_MODE_ENABLE True(1) True(1) True(1) The '*' shows parameters with next value different from default/current value. PF: lspci -v -nn -mm -k -s 0000:19:00.0 Slot: 19:00.0 Class: Ethernet controller [0200] Vendor: Mellanox Technologies [15b3] Device: MT27710 Family [ConnectX-4 Lx] [1015] SVendor: Mellanox Technologies [15b3] SDevice: ConnectX-4 Lx 25 GbE Dual Port SFP28 rNDC [0025] Driver: mlx5_core Module: mlx5_core NUMANode: 0 ethtool -i eno1 driver: mlx5_core version: 5.0-0 firmware-version: 14.26.6000 (DEL2810000034) expansion-rom-version: bus-info: 0000:19:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes VF: lspci -v -nn -mm -k -s 0000:19:00.2 Slot: 19:00.2 Class: Ethernet controller [0200] Vendor: Mellanox Technologies [15b3] Device: MT27710 Family [ConnectX-4 Lx Virtual Function] [1016] SVendor: Mellanox Technologies [15b3] SDevice: Device [0025] Driver: mlx5_core Module: mlx5_core NUMANode: 0 I try to use dpdk-19.11-5.el8_2 this is the output: testpmd -l ${CPU} -w ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC} --log-level="*:debug" --iova-mode=va -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 1 EAL: Detected lcore 2 as core 6 on socket 0 EAL: Detected lcore 3 as core 6 on socket 1 EAL: Detected lcore 4 as core 1 on socket 0 EAL: Detected lcore 5 as core 1 on socket 1 EAL: Detected lcore 6 as core 5 on socket 0 EAL: Detected lcore 7 as core 5 on socket 1 EAL: Detected lcore 8 as core 2 on socket 0 EAL: Detected lcore 9 as core 2 on socket 1 EAL: Detected lcore 10 as core 4 on socket 0 EAL: Detected lcore 11 as core 4 on socket 1 EAL: Detected lcore 12 as core 3 on socket 0 EAL: Detected lcore 13 as core 3 on socket 1 EAL: Detected lcore 14 as core 13 on socket 0 EAL: Detected lcore 15 as core 13 on socket 1 EAL: Detected lcore 16 as core 8 on socket 0 EAL: Detected lcore 17 as core 8 on socket 1 EAL: Detected lcore 18 as core 12 on socket 0 EAL: Detected lcore 19 as core 12 on socket 1 EAL: Detected lcore 20 as core 9 on socket 0 EAL: Detected lcore 21 as core 9 on socket 1 EAL: Detected lcore 22 as core 11 on socket 0 EAL: Detected lcore 23 as core 11 on socket 1 EAL: Detected lcore 24 as core 10 on socket 0 EAL: Detected lcore 25 as core 10 on socket 1 EAL: Detected lcore 26 as core 22 on socket 0 EAL: Detected lcore 27 as core 22 on socket 1 EAL: Detected lcore 28 as core 16 on socket 0 EAL: Detected lcore 29 as core 16 on socket 1 EAL: Detected lcore 30 as core 21 on socket 0 EAL: Detected lcore 31 as core 21 on socket 1 EAL: Detected lcore 32 as core 17 on socket 0 EAL: Detected lcore 33 as core 17 on socket 1 EAL: Detected lcore 34 as core 20 on socket 0 EAL: Detected lcore 35 as core 20 on socket 1 EAL: Detected lcore 36 as core 18 on socket 0 EAL: Detected lcore 37 as core 18 on socket 1 EAL: Detected lcore 38 as core 19 on socket 0 EAL: Detected lcore 39 as core 19 on socket 1 EAL: Detected lcore 40 as core 24 on socket 0 EAL: Detected lcore 41 as core 24 on socket 1 EAL: Detected lcore 42 as core 29 on socket 0 EAL: Detected lcore 43 as core 29 on socket 1 EAL: Detected lcore 44 as core 25 on socket 0 EAL: Detected lcore 45 as core 25 on socket 1 EAL: Detected lcore 46 as core 28 on socket 0 EAL: Detected lcore 47 as core 28 on socket 1 EAL: Detected lcore 48 as core 26 on socket 0 EAL: Detected lcore 49 as core 26 on socket 1 EAL: Detected lcore 50 as core 27 on socket 0 EAL: Detected lcore 51 as core 27 on socket 1 EAL: Detected lcore 52 as core 0 on socket 0 EAL: Detected lcore 53 as core 0 on socket 1 EAL: Detected lcore 54 as core 6 on socket 0 EAL: Detected lcore 55 as core 6 on socket 1 EAL: Detected lcore 56 as core 1 on socket 0 EAL: Detected lcore 57 as core 1 on socket 1 EAL: Detected lcore 58 as core 5 on socket 0 EAL: Detected lcore 59 as core 5 on socket 1 EAL: Detected lcore 60 as core 2 on socket 0 EAL: Detected lcore 61 as core 2 on socket 1 EAL: Detected lcore 62 as core 4 on socket 0 EAL: Detected lcore 63 as core 4 on socket 1 EAL: Detected lcore 64 as core 3 on socket 0 EAL: Detected lcore 65 as core 3 on socket 1 EAL: Detected lcore 66 as core 13 on socket 0 EAL: Detected lcore 67 as core 13 on socket 1 EAL: Detected lcore 68 as core 8 on socket 0 EAL: Detected lcore 69 as core 8 on socket 1 EAL: Detected lcore 70 as core 12 on socket 0 EAL: Detected lcore 71 as core 12 on socket 1 EAL: Detected lcore 72 as core 9 on socket 0 EAL: Detected lcore 73 as core 9 on socket 1 EAL: Detected lcore 74 as core 11 on socket 0 EAL: Detected lcore 75 as core 11 on socket 1 EAL: Detected lcore 76 as core 10 on socket 0 EAL: Detected lcore 77 as core 10 on socket 1 EAL: Detected lcore 78 as core 22 on socket 0 EAL: Detected lcore 79 as core 22 on socket 1 EAL: Detected lcore 80 as core 16 on socket 0 EAL: Detected lcore 81 as core 16 on socket 1 EAL: Detected lcore 82 as core 21 on socket 0 EAL: Detected lcore 83 as core 21 on socket 1 EAL: Detected lcore 84 as core 17 on socket 0 EAL: Detected lcore 85 as core 17 on socket 1 EAL: Detected lcore 86 as core 20 on socket 0 EAL: Detected lcore 87 as core 20 on socket 1 EAL: Detected lcore 88 as core 18 on socket 0 EAL: Detected lcore 89 as core 18 on socket 1 EAL: Detected lcore 90 as core 19 on socket 0 EAL: Detected lcore 91 as core 19 on socket 1 EAL: Detected lcore 92 as core 24 on socket 0 EAL: Detected lcore 93 as core 24 on socket 1 EAL: Detected lcore 94 as core 29 on socket 0 EAL: Detected lcore 95 as core 29 on socket 1 EAL: Detected lcore 96 as core 25 on socket 0 EAL: Detected lcore 97 as core 25 on socket 1 EAL: Detected lcore 98 as core 28 on socket 0 EAL: Detected lcore 99 as core 28 on socket 1 EAL: Detected lcore 100 as core 26 on socket 0 EAL: Detected lcore 101 as core 26 on socket 1 EAL: Detected lcore 102 as core 27 on socket 0 EAL: Detected lcore 103 as core 27 on socket 1 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 104 lcore(s) EAL: Detected 2 NUMA nodes EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_bnxt.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_e1000.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_enic.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_failsafe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_i40e.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ixgbe.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx4.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_mlx5.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_netvsc.so.20.0 EAL: Registered [vmbus] bus. EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_nfp.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_qede.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_ring.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_tap.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vdev_netvsc.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_vhost.so.20.0 EAL: open shared lib /usr/lib64/dpdk-pmds/librte_pmd_virtio.so.20.0 EAL: Ask a virtual area of 0x5000 bytes EAL: Virtual area found at 0x100000000 (size = 0x5000) EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Module /sys/module/vfio_pci not found! error 2 (No such file or directory) EAL: VFIO PCI modules not loaded EAL: Selected IOVA mode 'VA' EAL: Probing VFIO support... EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) EAL: VFIO modules not loaded, skipping VFIO support... EAL: Ask a virtual area of 0x2e000 bytes EAL: Virtual area found at 0x100005000 (size = 0x2e000) EAL: Setting up physically contiguous memory... EAL: Setting maximum number of open files to 1048576 EAL: Detected memory type: socket_id:0 hugepage_sz:1073741824 EAL: Detected memory type: socket_id:1 hugepage_sz:1073741824 EAL: Creating 4 segment lists: n_segs:32 socket_id:0 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x100033000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x140000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x940000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x980000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x1180000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x11c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x19c0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x1a00000000 (size = 0x800000000) EAL: Creating 4 segment lists: n_segs:32 socket_id:1 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2200000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2240000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2a40000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2a80000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3280000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x32c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3ac0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x3b00000000 (size = 0x800000000) EAL: TSC frequency is ~2100000 KHz EAL: Master lcore 12 is ready (tid=7f3200db5900;cpuset=[12]) EAL: lcore 64 is ready (tid=7f31f7b90700;cpuset=[64]) EAL: lcore 66 is ready (tid=7f31f738f700;cpuset=[66]) EAL: lcore 14 is ready (tid=7f31f8391700;cpuset=[14]) EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 1024MB EAL: PCI device 0000:19:00.6 on NUMA socket 0 EAL: probe driver: 15b3:1016 net_mlx5 EAL: Mem event callback 'MLX5_MEM_EVENT_CB:(nil)' registered net_mlx5: unable to recognize master/representors on the multiple IB devices EAL: Requested device 0000:19:00.6 cannot be used EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) testpmd: No probed ethernet devices Interactive-mode selected Set mac packet forwarding mode testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Done I also try to use u/s 19.11 and there I get the devX issue I will try to update the version to 20.11 rc testpmd -l ${CPU} -w ${PCIDEVICE_OPENSHIFT_IO_DPDKNIC} --iova-mode=va --log-level="*:debug" -- -i --portmask=0x1 --nb-cores=2 --forward-mode=mac --port-topology=loop --no-mlockall EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 1 EAL: Detected lcore 2 as core 6 on socket 0 EAL: Detected lcore 3 as core 6 on socket 1 EAL: Detected lcore 4 as core 1 on socket 0 EAL: Detected lcore 5 as core 1 on socket 1 EAL: Detected lcore 6 as core 5 on socket 0 EAL: Detected lcore 7 as core 5 on socket 1 EAL: Detected lcore 8 as core 2 on socket 0 EAL: Detected lcore 9 as core 2 on socket 1 EAL: Detected lcore 10 as core 4 on socket 0 EAL: Detected lcore 11 as core 4 on socket 1 EAL: Detected lcore 12 as core 3 on socket 0 EAL: Detected lcore 13 as core 3 on socket 1 EAL: Detected lcore 14 as core 13 on socket 0 EAL: Detected lcore 15 as core 13 on socket 1 EAL: Detected lcore 16 as core 8 on socket 0 EAL: Detected lcore 17 as core 8 on socket 1 EAL: Detected lcore 18 as core 12 on socket 0 EAL: Detected lcore 19 as core 12 on socket 1 EAL: Detected lcore 20 as core 9 on socket 0 EAL: Detected lcore 21 as core 9 on socket 1 EAL: Detected lcore 22 as core 11 on socket 0 EAL: Detected lcore 23 as core 11 on socket 1 EAL: Detected lcore 24 as core 10 on socket 0 EAL: Detected lcore 25 as core 10 on socket 1 EAL: Detected lcore 26 as core 22 on socket 0 EAL: Detected lcore 27 as core 22 on socket 1 EAL: Detected lcore 28 as core 16 on socket 0 EAL: Detected lcore 29 as core 16 on socket 1 EAL: Detected lcore 30 as core 21 on socket 0 EAL: Detected lcore 31 as core 21 on socket 1 EAL: Detected lcore 32 as core 17 on socket 0 EAL: Detected lcore 33 as core 17 on socket 1 EAL: Detected lcore 34 as core 20 on socket 0 EAL: Detected lcore 35 as core 20 on socket 1 EAL: Detected lcore 36 as core 18 on socket 0 EAL: Detected lcore 37 as core 18 on socket 1 EAL: Detected lcore 38 as core 19 on socket 0 EAL: Detected lcore 39 as core 19 on socket 1 EAL: Detected lcore 40 as core 24 on socket 0 EAL: Detected lcore 41 as core 24 on socket 1 EAL: Detected lcore 42 as core 29 on socket 0 EAL: Detected lcore 43 as core 29 on socket 1 EAL: Detected lcore 44 as core 25 on socket 0 EAL: Detected lcore 45 as core 25 on socket 1 EAL: Detected lcore 46 as core 28 on socket 0 EAL: Detected lcore 47 as core 28 on socket 1 EAL: Detected lcore 48 as core 26 on socket 0 EAL: Detected lcore 49 as core 26 on socket 1 EAL: Detected lcore 50 as core 27 on socket 0 EAL: Detected lcore 51 as core 27 on socket 1 EAL: Detected lcore 52 as core 0 on socket 0 EAL: Detected lcore 53 as core 0 on socket 1 EAL: Detected lcore 54 as core 6 on socket 0 EAL: Detected lcore 55 as core 6 on socket 1 EAL: Detected lcore 56 as core 1 on socket 0 EAL: Detected lcore 57 as core 1 on socket 1 EAL: Detected lcore 58 as core 5 on socket 0 EAL: Detected lcore 59 as core 5 on socket 1 EAL: Detected lcore 60 as core 2 on socket 0 EAL: Detected lcore 61 as core 2 on socket 1 EAL: Detected lcore 62 as core 4 on socket 0 EAL: Detected lcore 63 as core 4 on socket 1 EAL: Detected lcore 64 as core 3 on socket 0 EAL: Detected lcore 65 as core 3 on socket 1 EAL: Detected lcore 66 as core 13 on socket 0 EAL: Detected lcore 67 as core 13 on socket 1 EAL: Detected lcore 68 as core 8 on socket 0 EAL: Detected lcore 69 as core 8 on socket 1 EAL: Detected lcore 70 as core 12 on socket 0 EAL: Detected lcore 71 as core 12 on socket 1 EAL: Detected lcore 72 as core 9 on socket 0 EAL: Detected lcore 73 as core 9 on socket 1 EAL: Detected lcore 74 as core 11 on socket 0 EAL: Detected lcore 75 as core 11 on socket 1 EAL: Detected lcore 76 as core 10 on socket 0 EAL: Detected lcore 77 as core 10 on socket 1 EAL: Detected lcore 78 as core 22 on socket 0 EAL: Detected lcore 79 as core 22 on socket 1 EAL: Detected lcore 80 as core 16 on socket 0 EAL: Detected lcore 81 as core 16 on socket 1 EAL: Detected lcore 82 as core 21 on socket 0 EAL: Detected lcore 83 as core 21 on socket 1 EAL: Detected lcore 84 as core 17 on socket 0 EAL: Detected lcore 85 as core 17 on socket 1 EAL: Detected lcore 86 as core 20 on socket 0 EAL: Detected lcore 87 as core 20 on socket 1 EAL: Detected lcore 88 as core 18 on socket 0 EAL: Detected lcore 89 as core 18 on socket 1 EAL: Detected lcore 90 as core 19 on socket 0 EAL: Detected lcore 91 as core 19 on socket 1 EAL: Detected lcore 92 as core 24 on socket 0 EAL: Detected lcore 93 as core 24 on socket 1 EAL: Detected lcore 94 as core 29 on socket 0 EAL: Detected lcore 95 as core 29 on socket 1 EAL: Detected lcore 96 as core 25 on socket 0 EAL: Detected lcore 97 as core 25 on socket 1 EAL: Detected lcore 98 as core 28 on socket 0 EAL: Detected lcore 99 as core 28 on socket 1 EAL: Detected lcore 100 as core 26 on socket 0 EAL: Detected lcore 101 as core 26 on socket 1 EAL: Detected lcore 102 as core 27 on socket 0 EAL: Detected lcore 103 as core 27 on socket 1 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 104 lcore(s) EAL: Detected 2 NUMA nodes EAL: Ask a virtual area of 0x5000 bytes EAL: Virtual area found at 0x100000000 (size = 0x5000) EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Module /sys/module/vfio_pci not found! error 2 (No such file or directory) EAL: VFIO PCI modules not loaded dpaa: rte_dpaa_bus_scan(): >> EAL: DPAA Bus not present. Skipping. fslmc: fslmc_get_container_group(): DPAA2: DPRC not available fslmc: rte_fslmc_scan(): FSLMC Bus Not Available. Skipping (-22) EAL: Selected IOVA mode 'VA' EAL: Probing VFIO support... EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) EAL: VFIO modules not loaded, skipping VFIO support... EAL: Ask a virtual area of 0x2e000 bytes EAL: Virtual area found at 0x100005000 (size = 0x2e000) EAL: Setting up physically contiguous memory... EAL: Setting maximum number of open files to 1048576 EAL: Detected memory type: socket_id:0 hugepage_sz:1073741824 EAL: Detected memory type: socket_id:1 hugepage_sz:1073741824 EAL: Creating 4 segment lists: n_segs:32 socket_id:0 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x100033000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x140000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x940000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x980000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x1180000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x11c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x19c0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 0 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x1a00000000 (size = 0x800000000) EAL: Creating 4 segment lists: n_segs:32 socket_id:1 hugepage_sz:1073741824 EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2200000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2240000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x2a40000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x2a80000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3280000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x32c0000000 (size = 0x800000000) EAL: Ask a virtual area of 0x1000 bytes EAL: Virtual area found at 0x3ac0000000 (size = 0x1000) EAL: Memseg list allocated: 0x100000kB at socket 1 EAL: Ask a virtual area of 0x800000000 bytes EAL: Virtual area found at 0x3b00000000 (size = 0x800000000) EAL: TSC frequency is ~2100000 KHz EAL: Master lcore 8 is ready (tid=7f2faa1e8c00;cpuset=[8]) EAL: lcore 60 is ready (tid=7f2fa6f45400;cpuset=[60]) EAL: lcore 62 is ready (tid=7f2fa6744400;cpuset=[62]) EAL: lcore 10 is ready (tid=7f2fa7746400;cpuset=[10]) EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 1024MB EAL: PCI device 0000:19:00.2 on NUMA socket 0 EAL: probe driver: 15b3:1016 net_mlx5 EAL: Mem event callback 'MLX5_MEM_EVENT_CB:(nil)' registered net_mlx5: checking device "mlx5_6" net_mlx5: checking device "mlx5_5" net_mlx5: checking device "mlx5_4" net_mlx5: checking device "mlx5_3" net_mlx5: checking device "mlx5_2" net_mlx5: PCI information matches for device "mlx5_2" net_mlx5: checking device "mlx5_1" net_mlx5: checking device "mlx5_0" net_mlx5: no E-Switch support detected net_mlx5: naming Ethernet device "0000:19:00.2" net_mlx5: DevX is supported net_mlx5: Failed to create TIS using DevX net_mlx5: TIS allocation failure net_mlx5: probe of PCI device 0000:19:00.2 aborted after encountering an error: Cannot allocate memory EAL: Requested device 0000:19:00.2 cannot be used EAL: Module /sys/module/vfio not found! error 2 (No such file or directory) testpmd: No probed ethernet devices Interactive-mode selected Set mac packet forwarding mode testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Done
OK, so you don't know what the difference between the clusters... Can you provide me access with reproduction steps?
sure contacted you offline
> net_mlx5: Failed to create TIS using DevX > net_mlx5: TIS allocation failure In worker's node dmesg, I can see this error right after the testpmd failure: [98942.666462] mlx5_core 0000:19:00.2: mlx5_cmd_check:760:(pid 1581338): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x6a6678) The syndrome meaning is: 0x6A6678 | create_tis: cannot create while log_max_tis = 0 > net_mlx5: probe of PCI device 0000:19:00.2 aborted after encountering an error: Cannot allocate memory > EAL: Requested device 0000:19:00.2 cannot be used Will continue checking...
Hi, Sebastian. Can you send me the yaml file used to build this POD? Thanks, Alaa
Hi Alaa, sure ENV BUILDER_VERSION 0.1 ENV DPDK_VER 19.11-4 ENV DPDK_DIR /usr/share/dpdk ENV RTE_TARGET=x86_64-default-linux-gcc ENV RTE_EXEC_ENV=linux ENV RTE_SDK=${DPDK_DIR} RUN INSTALL_PKGS="bsdtar \ findutils \ groff-base \ glibc-locale-source \ glibc-langpack-en \ gettext \ rsync \ scl-utils \ tar \ unzip \ xz \ yum \ dpdk \ dpdk-devel \ dpdk-tools \ make \ rdma-core \ libibverbs \ git \ gcc \ expect" && \ mkdir -p ${HOME}/.pki/nssdb && \ microdnf install -y --setopt=tsflags=nodocs $INSTALL_PKGS && \ rpm -V $INSTALL_PKGS && \ microdnf -y clean all --enablerepo='*'
(In reply to Sebastian Scheinkman from comment #16) > Hi Alaa, > > sure > > ENV BUILDER_VERSION 0.1 > ENV DPDK_VER 19.11-4 > ENV DPDK_DIR /usr/share/dpdk > ENV RTE_TARGET=x86_64-default-linux-gcc > ENV RTE_EXEC_ENV=linux > ENV RTE_SDK=${DPDK_DIR} > > RUN INSTALL_PKGS="bsdtar \ > findutils \ > groff-base \ > glibc-locale-source \ > glibc-langpack-en \ > gettext \ > rsync \ > scl-utils \ > tar \ > unzip \ > xz \ > yum \ > dpdk \ > dpdk-devel \ > dpdk-tools \ > make \ > rdma-core \ > libibverbs \ > git \ > gcc \ > expect" && \ > mkdir -p ${HOME}/.pki/nssdb && \ > microdnf install -y --setopt=tsflags=nodocs $INSTALL_PKGS && \ > rpm -V $INSTALL_PKGS && \ > microdnf -y clean all --enablerepo='*' I don't see the enabled capabilities. Did you enable any? like cap_sys_admin, cap_net_admin, cap_net_raw, cap_ipc_lock+ep, etc...
This is in the pod spec and is the same for over 1 year (it was working before and still working on intel base nics) apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks: "dpdk-testing/test-dpdk-network" openshift.io/scc: privileged labels: app: dpdk name: dpdk-pod namespace: dpdk-testing spec: containers: - command: - /bin/bash - -c - sleep INF env: - name: RUN_TYPE value: testpmd image: registry-proxy.engineering.redhat.com/rh-osbs/dpdk-base:v4.6.0-8 imagePullPolicy: Always name: dpdk resources: limits: cpu: "4" hugepages-1Gi: 2Gi memory: 6Gi requests: cpu: "4" hugepages-1Gi: 2Gi memory: 6Gi securityContext: capabilities: add: - IPC_LOCK - SYS_RESOURCE runAsUser: 0 volumeMounts: - mountPath: /mnt/huge name: hugepages restartPolicy: Always volumes: - emptyDir: medium: HugePages name: hugepages
Hi Alaa, do you have any update on this issue? Thanks! Sebastian
(In reply to Sebastian Scheinkman from comment #19) > Hi Alaa, > > do you have any update on this issue? > > Thanks! > Sebastian hi, The team was trying to get a similar system in our lab, but there isn't much progress on that. So I'll need your system again. Please make sure it will be available for debug next week.
The system seems to be down... Can you check?
I shared the new env offline I also upgrade the firmware version of the nics as requested.
(In reply to Sebastian Scheinkman from comment #18) > This is in the pod spec and is the same for over 1 year (it was working > before and still working on intel base nics) > > apiVersion: v1 > kind: Pod > metadata: > annotations: > k8s.v1.cni.cncf.io/networks: "dpdk-testing/test-dpdk-network" > openshift.io/scc: privileged > labels: > app: dpdk > name: dpdk-pod > namespace: dpdk-testing > spec: > containers: > - command: > - /bin/bash > - -c > - sleep INF > env: > - name: RUN_TYPE > value: testpmd > image: registry-proxy.engineering.redhat.com/rh-osbs/dpdk-base:v4.6.0-8 > imagePullPolicy: Always > name: dpdk > resources: > limits: > cpu: "4" > hugepages-1Gi: 2Gi > memory: 6Gi > requests: > cpu: "4" > hugepages-1Gi: 2Gi > memory: 6Gi > securityContext: > capabilities: > add: > - IPC_LOCK > - SYS_RESOURCE This is a configuration issue. Need to add also "NET_RAW" capability for DEVX commands to work properly. > runAsUser: 0 > volumeMounts: > - mountPath: /mnt/huge > name: hugepages > restartPolicy: Always > volumes: > - emptyDir: > medium: HugePages > name: hugepages
Hi Alaa, Thanks for the comment! The NET_RAW was enabled by default until this CVE https://access.redhat.com/security/cve/cve-2020-14386 I will update the openshift documentation https://docs.openshift.com/container-platform/4.6/networking/hardware_networks/using-dpdk-and-rdma.html
Should I assign this bz to you Sebastian and/or change the component?
Hi David, I will do it
OCP: 4.7.0-fc.0 CNF_IMAGE_VERSION: openshift4-cnf-tests:v4.7.0-10 DPDK_IMAGE_VERSION: dpdk-base:v4.7.0-4 ========================================================= • [SLOW TEST:54.483 seconds] dpdk /remote-source/app/functests/dpdk/dpdk.go:87 Validate a DPDK workload running inside a pod /remote-source/app/functests/dpdk/dpdk.go:168 Should forward and receive packets /remote-source/app/functests/dpdk/dpdk.go:169
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633